sameroom
2017-02-05 00:15
has joined #json


zehicle
2017-02-05 00:15
testing bot

2017-02-05 00:16
bot reverse

2017-02-05 07:53
@zehicle - thanks for the update. I'll look on Monday morning. Is there a threaded/batched way to engage with these sorts of conversations? Email list, web forum etc? IM is clunky when I'm online maybe 30hrs/week and in the wrong timezone.

2017-02-05 20:43
@gregoryo2008_twitter yes, there's an email list https://groups.google.com/forum/#!forum/digitalrebar

2017-02-05 20:44
we could also add you to the community slack channel (this channel is now cross linked to that one)

2017-02-06 00:43
@zehicle Okay I've joined the forum (crickets!). If I stick with Rebar after this week, I'll talk to you about the Slack channel.

greg
2017-02-06 00:44
@gregoryo2008_twitter: What is going to make your decision?

2017-02-06 00:55
Whoa, actual live chat! Good question - we've been working with Fuel and I'm casting about to see how other systems work. I'm entirely new to OpenStack so I'm not sure if I know enough yet to know what is going to make the decision!

greg
2017-02-06 00:56
our openstack is pretty raw, but making progress.

greg
2017-02-06 00:56
Where are you?

2017-02-06 00:57
Conceptually though, I'm interested to know how we'll end up managing and upgrading our nodes once we have a system in place. Getting it going initially seems like the basic functionality of a deployment system, but those other tasks are going to become quite important, especially since OpenStack has a quick and almost-enforced upgrade cycle.

2017-02-06 00:57
I'm in Western Australia.

2017-02-06 00:58
Puppet is our CMDB of choice, so we're hoping that whatever tool we use can cope with that either being integrated some how, or bolted on afterward.

greg
2017-02-06 00:58
Yeah - those are good questions, that I'm not sure anyone has answered. I think we have a workable story, but we need people to start.

greg
2017-02-06 00:59
We have worked with that in the past. We can drive it or call out to puppet post provisioining. Either works.

2017-02-06 01:00
Hey have you got any recommendations for a good way to get up and running? I'm working through a Linux Academy training course, and it's built on Icehouse ): Clearly working with it is going to be ultimately the main method, but some cohesive guidance would be good too.

2017-02-06 01:00
Okay, good to hear that Puppet isn't going to be a square peg. Devil in the details, of course.

greg
2017-02-06 01:02
hmm - next to think about it. We may want to let you see our stuff. It will get the basics up, but still needs a little work around actual compute and neutron side.

greg
2017-02-06 01:03
It gets you k8s along the way. :slightly_smiling_face: or :disappointed: depending upon your perspective.

2017-02-06 01:03
k8s?

greg
2017-02-06 01:03
kuberenetes - most openstack deployment mechanisms are moving to it to manage the deployment of Openstack.

greg
2017-02-06 01:04
You run the openstack components as containers managed by k8s.

greg
2017-02-06 01:04
Some have the goal of seamless container and vm interactions.

greg
2017-02-06 01:04
We'll see if that comes about.

2017-02-06 01:04
Oh goody, another thing to learn. We've been looking at k8s from afar wondering if we'd need to work out what it actually is.

2017-02-06 01:05
Is there a performance/manageability tradeoff at every turn with these decisions?

greg
2017-02-06 01:05
manageability.

greg
2017-02-06 01:06
performance shouldn't be hurt. Containers appear to be a good trade-off in that area. These are mostly long running containers, so perf isn't really a problem.

greg
2017-02-06 01:06
The manageability comes from the fact that k8s can help with availability and upgrades.

greg
2017-02-06 01:07
k8s works on manifests and you can change a manifest and the system will "automagically" move the containers to new versions. We'll see how well it goes in practice, but ...

greg
2017-02-06 01:07
So, you get some upgrade in there.

2017-02-06 01:08
Very promising in theory. Without knowing how the guts work it really is just magic to me for now.

greg
2017-02-06 01:09
To add fear, the openstack we are playing with is opensource stuff with AT*T . They use helm which is a manifest manager for k8s. It will hopefully get to where the helm charts will have upgrade actions like roll out next version. db upgrades, container updates, and .....

greg
2017-02-06 01:09
Our system will do it all - deploy k8s, helm, ceph, and then openstack.

2017-02-06 01:11
Haven't heard of helm - just found your vid on YouTube about it, will have a watch.

greg
2017-02-06 01:12
:slightly_smiling_face: be kind

2017-02-06 01:13
Portal 2 icon peeking through from the desktop is a nice touch (:

greg
2017-02-06 01:15
All coding makes Greg a sad boy.

2017-02-06 01:18
So I have tried to grok the relationship between Crowbar, OpenCrowbar and Digital Rebar... Crowbar is v1 and looked after by SUSE, OpenCrowbar (v2) became DigitalRebar (v3) - right?

greg
2017-02-06 01:18
Yep

greg
2017-02-06 01:19
V2 was started at Dell but died internal. 2.5 years ago Dell said they were done. Rob left Dell and I joined him again and started DigitalRebar from that base. Victor joined us soon after.

2017-02-06 01:19
Time to feed the :bear:!

greg
2017-02-06 01:20
I left Dell as 2.0 was being designed. I didn't want to deal with the transition to private company.

2017-02-06 01:20
Time to feed the :bear:!

greg
2017-02-06 01:21
I created Crowbar with Rob. Victor was one of the first to join the team. We've been doing this awhile.

2017-02-06 01:22
Okay, all makes sense. Okay straight up: Watching a couple of videos I'm hearing terms like 'proof of concept' and wondering about maturity and readiness. We're looking to build a production cluster in the next few months (at least some of the team knows heaps more about OpenStack than me). Should we be considering Rebar now?

greg
2017-02-06 01:22
We've learned lots and learn more all the time. The crowbar and some of 2.0 design have issues. We've been amazed that Suse continues to drive on, but they don't really change Crowbar and live with warts, but are going to have issues trying to add additional workloads and support for newer systems.

2017-02-06 01:23
Yeah it's those warts that I don't have a clue about yet.

greg
2017-02-06 01:24
You can't debug an issue easily in CB1.0 and 2.0. DR has more atomic and segregated items.

greg
2017-02-06 01:25
Well, we are brutally honest at times about our state. We have people starting to use us for production with some pretty intense integrations. Should be able to talk about that more soon.

2017-02-06 01:26
It all sounds pretty good, so I'll keep hacking and see how it goes. Gotta go fight with the hardware networking config on my test box now, so I can hopefully browse to my newly installed DR.

greg
2017-02-06 01:26
I think your problem isn't going to be DR, but the stability of the systems you want to deploy. We can help with that.

greg
2017-02-06 01:26
networking always fun

2017-02-06 01:27
Do you have a sense of Fuel's community, software (design, operation etc) and direction?

2017-02-06 01:27
Someone told me that Mirantis are pulling out of Fuel.

greg
2017-02-06 01:28
In some regard, I think they are influx. They aren't sure what they want to do there.

greg
2017-02-06 01:29
I've never completely bought into fuel. To merged together and not separable. We deployed fuel to deploy openstack once. Very silly POC>

2017-02-06 01:41
Well that's exactly what we've been doing - last week a colleague created a few dozen VMs as a test - all on OpenStack deployed with Fuel. Seems to be going okay.

2017-02-06 01:42
But ongoing manageability is a question - looks like they use mcollective to kick locally installed Puppet manifests.

2017-02-06 01:43
Anyway, it's been excellent to talk to you, thanks for your time. Time for me to go and learn more.

greg
2017-02-06 01:44
:slightly_smiling_face: have fun.

2017-02-06 02:14
Hey @galthaus can I get an invite to community Slack? (If that's a better comms channel than this...)

2017-02-06 02:25
@gregoryo2008_twitter to my knowledge, old FUEL is not being maintained (it's really just cobbler + puppet). New FUEL is completely different w/o an upgrade based on salt (and MaaS?). Also, it's OpenStack focused. We've been careful to be general purpose. That adds some complexity but makes us more robust too. That's why we've got ansible, puppet and chef all running together.

wdennis
2017-02-06 02:27
has joined #json

2017-02-06 02:28
@gregoryo2008_twitter we're betting that kubernetes is the better underlay and have been working to support the OpenStack-Helm efforts. It's not all there yes (Greg said the same thing earlier) but could get there pretty fast.

2017-02-06 02:28
https://www.youtube.com/watch?v=wZ0vMrdx4a4&index=2&list=PLXPBeIrpXjfjabMbwYyDULOX3kZmlxEXK

2017-02-06 02:59
What's this talk of 'old' Fuel and 'new' Fuel? There's https://www.mirantis.com/software/openstack/fuel/ and https://fuel-infra.org - is that what you mean?

2017-02-06 03:01
I don't think so. fuel-infra looks like the "big tent" version of old fuel. I think the new one is Fuel-Ccp https://github.com/openstack/fuel-ccp

2017-02-06 03:23
Ah, something else again. Not sure what you mean by big tent - open source community version is how I've heard it described, versus Mirantis' version. Anyway, thank you, this adds more grist for the mill.

2017-02-06 04:11
like most of OpenStack, it's a long story. Are you looking for DYI or a distro?

2017-02-06 04:36
We're looking to be able to deploy OpenStack using some sort of deployment tool that is both easy to get started, and customisable. We want to be able to upgrade it without massive effort every six months. We want to be able to automate as much as possible in ways that we can control - we have Puppet expertise and in-house usage already.

2017-02-06 04:38
I suspect we will want to be able to customise both the deployment platform's settings _and_ the OpenStack that it is deploying and managing. If we can do both with something like Puppet, all the better.

2017-02-06 04:45
I'd be happy to set up a 1x1 to see if there's a fit.

2017-02-06 04:46
IMHO, what you are describing is not just about OpenStack but broader ops. That's our approach. The point of the K8s underlay for Openstack is to help w/ upgrades of OpenStack but it can be used more broadly too.

2017-02-06 04:54
What do you mean a 1x1?

2017-02-06 13:37
a call or meeting instead of via the community chat

2017-02-06 21:54
@wdennis I checked out the redeploy process and recorded a demo of how it should work for the UX and CLI.

2017-02-06 21:54
ubuntu-16.04

2017-02-06 21:56
I had some issues at first because I was using the CLI with "rebar nodes commit X" after I changed the value and that was causing issues if the noderoles were still in process. Skipping the commit is OK for redeploy.

wdennis
2017-02-07 01:58
@zehicle Thanks for the redeploy vid

wdennis
2017-02-07 01:58
Am bring up a new DR system to try things out on now

2017-02-07 02:09
@zehicle Ah okay, thanks will keep it in mind. Following through the deployment guide and videos for now to get a feel for it.

wdennis
2017-02-07 03:05
Hmmm, installing DR via the ?quickstart.sh? script as a non-priv?d user does not seem to work? UI comes up, but no objects within

wdennis
2017-02-07 03:06
Looks like I have all the needed containers running...

wdennis
2017-02-07 03:07

zehicle
2017-02-07 03:10
did you start it before as root? your permissions may be in a bad state

greg
2017-02-07 03:14
@wdennis - cd digitalrebar/deploy/compose

greg
2017-02-07 03:14
docker-compose logs -f rebar_api

greg
2017-02-07 03:14
or

greg
2017-02-07 03:14
docker-compose logs rebar_api > /tmp/rebar.log

greg
2017-02-07 03:15
That can give some hints to where we are at. Hung, or errored or skipped.

2017-02-07 03:16
I?m getting confused about choosing access mode - host or forwarder. I?m trying to create a metal admin node, to manage all metal nodes.

2017-02-07 03:17
"For a Metal or KVM booting dev-test add: --con-provisioner --access=FORWARDER"

2017-02-07 03:17
generally, you want to use --access=HOST

2017-02-07 03:17
unless you are trying to run VMs locally

2017-02-07 03:17
"Host mode ? is useful for systems that are managing ? joined nodes (VMs or physical nodes), or dedicated hosts"

2017-02-07 03:18
which page are you reading? I'll see about updating it. because you also want --con-dhcp

2017-02-07 03:18
http://digital-rebar.readthedocs.io/en/latest/deployment/questions.html#what-access-mode-should-i-use

2017-02-07 03:18
what are you trying to do?

2017-02-07 03:18
http://digital-rebar.readthedocs.io/en/latest/deployment/install/linux.html

2017-02-07 03:18
Create an admin node on metal, to manage systems all on metal.

2017-02-07 03:19
then you'll want host mode.

2017-02-07 03:19
Thanks

2017-02-07 03:19
I did see a ?con-dhcp reference somewhere in the docs

wdennis
2017-02-07 03:19
@zehicle negative, downloaded quickstart.sh & then exec?d it as a non-priv?d user (one I made for DR, ?dradmin?)

zehicle
2017-02-07 03:20
I think you may need that user to have sudoer rights

wdennis
2017-02-07 03:20
Oh, I see @greg responded - stand by, let me look...

wdennis
2017-02-07 03:21
@zehicle yes, the user does have sudo rights (in the ?wheel? group in CentOS 7)

2017-02-07 03:21
Ah yes, last question on deployhment/questions.html in context of Provisioner - mentions DHCP and ?con-dhcp

2017-02-07 03:21
@gregoryo2008_twitter you'll need to make sure you understand your DHCP / Network environment. Do you have another DHCP server on your network?

2017-02-07 03:21
Currently yes, but we?re about to turn it off (:

2017-02-07 03:22
Here's a video about setting up DHCP - it MUST match your network environment

2017-02-07 03:22
https://www.youtube.com/watch?v=5YWMlYYuu-s&index=9&list=PLXPBeIrpXjfgurJuwVjZkcfmatCoXYM_v

2017-02-07 03:23
Okay I?ll leap ahead and watch that first. I was planning to hit go and then watch vids - I?m up to 002

2017-02-07 03:23
that one is pretty important for bare metal.

2017-02-07 03:23
Yep, makes sense

wdennis
2017-02-07 03:23
@greg Seeing stuff in rebar_api logs like this:

2017-02-07 03:23
forwarder is used when you don't want to leak DHCP to your tenwork

wdennis
2017-02-07 03:24

greg
2017-02-07 03:24
That seems like consul didn't start right.

greg
2017-02-07 03:25
hmmm -

greg
2017-02-07 03:29
@wdennis - have you confessed to me your setup?

wdennis
2017-02-07 03:31
Forgive me father for I have sinned? :wink:

wdennis
2017-02-07 03:32
CentOS 7.3, made a regular user ?dradmin?, home = /home/dradmin

wdennis
2017-02-07 03:32
Added to ?wheel? group which grants sudoer rights

wdennis
2017-02-07 03:33

wdennis
2017-02-07 03:33
(into /home/dradmin)

greg
2017-02-07 03:33
curl | bash

greg
2017-02-07 03:33
curl | bash

greg
2017-02-07 03:33
curl | bash

greg
2017-02-07 03:33
okay or not

wdennis
2017-02-07 03:34
chmod +x, then ./drqs.sh --con-provisioner --con-dhcp --access=FORWARDER

greg
2017-02-07 03:34
hmm

greg
2017-02-07 03:35
Talk to me of your networking.

greg
2017-02-07 03:35
and what are you going to boot off this powerful tool that is DigitalRebar

wdennis
2017-02-07 03:36
Ansible all ran fine, except at end got:

wdennis
2017-02-07 03:36

greg
2017-02-07 03:36
checking something.

wdennis
2017-02-07 03:36
Have an awesome Dell PE2950 (newer one, yeah still old now but wth) with 16GB RAM

2017-02-07 03:37
Time to feed the :bear:!

wdennis
2017-02-07 03:38
server is just on a single flat network, but am planning on testing w/ KVM nodes on the server itself

greg
2017-02-07 03:38
You might be up.

greg
2017-02-07 03:38

greg
2017-02-07 03:39
nvm - you showed me stuff that shouldn't be up.

greg
2017-02-07 03:39
You can try it, but I don't htink it is likely

wdennis
2017-02-07 03:39
I got nuthin'

wdennis
2017-02-07 03:40
(nothing in Deployments, Workloads, Networks, etc.)

greg
2017-02-07 03:40
You login though?

wdennis
2017-02-07 03:41
Yup, that worked

wdennis
2017-02-07 03:41
Except I authd at the web UI login, then got an additional login as so:

greg
2017-02-07 03:42
Yeah - we have challenges at that. We really need to get out of digest auth/ssl mode and move to basic/ssl.

wdennis
2017-02-07 03:42
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F41RWH3PU/screen_shot_2017-02-06_at_9.57.05_pm.png and commented: double login screen

greg
2017-02-07 03:42
It will clean that up.

greg
2017-02-07 03:42
Yeah. We have an issue on in the UX.

greg
2017-02-07 03:43
We use digest auth, but it really mucks up the browsers and since we switched to single page material design app, it gets confused.

greg
2017-02-07 03:43
Anyway ...

wdennis
2017-02-07 03:43
So lonely with nothing around...


greg
2017-02-07 03:44
can you check top on the system and see if it is pounding.

greg
2017-02-07 03:44
Do you have a system deployment?

wdennis
2017-02-07 03:46
Top looks OK to me...

greg
2017-02-07 03:46
ps auxwww | grep puma

wdennis
2017-02-07 03:46

2017-02-07 03:47
@zehicle Okay the DHCP stuff all makes sense. Regarding the option to collapse ?dhcp? and ?host? networks into one - what is the benefit of having them separate?

wdennis
2017-02-07 03:47

greg
2017-02-07 03:47
@gregoryo2008_twitter - well - it style and history

greg
2017-02-07 03:48
@wdennis - something seems to be running very slow

wdennis
2017-02-07 03:48
@greg Ah, but what?

greg
2017-02-07 03:49
@gregoryo2008_twitter - the two ranges act as a discovery limiter for large environments.

greg
2017-02-07 03:49
The dhcp range has a set of short leases and nodes are only in that range until they are assigned a more permanent static-ish IP.

2017-02-07 03:50
Sure, but it appears that anon leases can be short, while the known ones be longer, even when in the same pool.

greg
2017-02-07 03:50
It is also the case that the ranges act as buffers for people that may not know all the things DHCP on their network.

2017-02-07 03:51
Something for us to keep in mind I guess. Sticking mostly to defaults for now!

greg
2017-02-07 03:51
Yeah - I'm not sure we've tried setting both anon and bound in the same range. It seems like it should work the way you described and what makes sense in my head, but ...

greg
2017-02-07 03:52
@wdennis - hmmm - can you restart the rebar_api container?

greg
2017-02-07 03:52
cd digitalrebar/deploy/compose

greg
2017-02-07 03:52
docker-compose restart rebar_api

greg
2017-02-07 03:53
docker-compose logs -f rebar_api

greg
2017-02-07 03:53
This is slow (but not 49 minutes slow). Rails apps start slowly.

wdennis
2017-02-07 03:58
OK, restarted, but still seeing:

wdennis
2017-02-07 03:58

greg
2017-02-07 03:58
That is "normal"

wdennis
2017-02-07 03:58
Oh good :stuck_out_tongue:

greg
2017-02-07 03:58
That part about the rails app coming up taking forever. That is it.

wdennis
2017-02-07 03:58
Ah

greg
2017-02-07 04:06
how about now?

greg
2017-02-07 04:06
same few lines?

wdennis
2017-02-07 04:07
This is where I?m at now:

wdennis
2017-02-07 04:08

greg
2017-02-07 04:09
Do the disks work? :slightly_smiling_face:

greg
2017-02-07 04:10
umm - sooo - it is trying to load the basic content.

greg
2017-02-07 04:10
You should see something like this:

greg
2017-02-07 04:11
```rebar_api_1 | trueCalling cmd: /usr/local/entrypoint.d/25-load-initial-workloads.sh rebar_api_1 | 2017/02/03 21:35:42 [INFO] serf: EventMemberJoin: 4afe50dc6d03 172.17.0.3 rebar_api_1 | Loading the core barclamp metadata rebar_api_1 | 2017/02/03 21:35:52 [INFO] serf: EventMemberUpdate: 45f9fd79fd96 rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/6fusion/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/burnin/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/efk-logging/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/deis/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/openstack/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/helm/rebar.yml rebar_api_1 | Loading barclamp metadata from /opt/digitalrebar/rackn-workloads/kubernetes/heapster-monitoring/rebar.yml ```

greg
2017-02-07 04:11
kinda like that - that is my private play things, but ...

greg
2017-02-07 04:12
loading core is the first one though. Always.

greg
2017-02-07 04:12
okay - new issue to check.

greg
2017-02-07 04:12
docker-compose ps | grep rule

greg
2017-02-07 04:13
The rule-engine may be having problems - that start up script waits for it to start.

greg
2017-02-07 04:13
docker-compose restart rule-engine

greg
2017-02-07 04:14
might help as well.

wdennis
2017-02-07 04:16

greg
2017-02-07 04:18
docker-compose logs -f rule-engine

wdennis
2017-02-07 04:18
OK, that seemed to kick things, but from rebar_api log:

wdennis
2017-02-07 04:18

greg
2017-02-07 04:19
ok

greg
2017-02-07 04:19
That is okay. apparently dcos workload is now out of date and broke.

wdennis
2017-02-07 04:19
OK

wdennis
2017-02-07 04:20
Here?s end of rule-engine log:

wdennis
2017-02-07 04:20

greg
2017-02-07 04:20
That looks better

greg
2017-02-07 04:21
hmm - wonder what caused this timing/startup mess up

wdennis
2017-02-07 04:21
And now we got stuffs in the UI :slightly_smiling_face:

greg
2017-02-07 04:21
The rule-engine hang up is what was causing your system.

greg
2017-02-07 04:21
again not sure why.

wdennis
2017-02-07 04:22
OK, so to roll back: it is supported to run the DR system as no non-priv?d user (albeit one with sudo rights)?

wdennis
2017-02-07 04:23
*as a non-priv?d user

greg
2017-02-07 04:23
yes

wdennis
2017-02-07 04:23
cool

wdennis
2017-02-07 04:23
right answer :slightly_smiling_face:

greg
2017-02-07 04:24
yeah - I got one finally right

wdennis
2017-02-07 04:24
blind squirrel and all that :wink:

wdennis
2017-02-07 04:27
Hmm? still noting showing under Networks? I should see something there, yes?

wdennis
2017-02-07 04:27
*nothing

2017-02-07 04:28
it can take time to get the networks there - it's part of the init sequence

2017-02-07 04:28
you should be able to watch them from the rebar api log when it uses the API to add networks

wdennis
2017-02-07 04:28
OK

greg
2017-02-07 04:33
They are after the core components.

2017-02-07 05:27
Hmm, that DHCP video said I could change configs before deployment, but they don?t seem to have taken effect. I changed IPs in `compose/config-dir/api/config/networks/the_admin.json.mac` and then ran `./run-in-system.sh --deploy-admin=local --access=host --con-provisioner --con-dhcp --admin-ip=$IPA`, now https://$IPA/ux/#/networks/1 shows the default 192.168.124 network.

2017-02-07 05:29
From the web UX I tried to change them, and it gave an error and now ?Ranges ? Error Loading Ranges"

greg
2017-02-07 05:43
--access=HOST

greg
2017-02-07 05:43
caps matter for that one.

2017-02-07 05:43
Ah, thanks. So, can I rerun and fix this, or should I blow it away (uh, how?) and start again. I could of course reformat the host.

greg
2017-02-07 05:44
with regard to the UX changes, I think we fixed them, but need to rebuild the revproxy container to pull the set of changes.

greg
2017-02-07 05:44
You can rerun and it will blow it away and start over.

2017-02-07 05:44
Excellent

2017-02-07 07:37
Reading http://digital-rebar.readthedocs.io/en/latest/deployment/install/raid-bios.html I can?t seem to download MegaCLI version 8.07.14 - only 8.07.07 is available.

2017-02-07 07:40
Heh, some guy at http://techedemic.com is hosting it.

greg
2017-02-07 13:48
hmm - someone else last night had problems, but eventually found them on the sites listed there.

greg
2017-02-07 20:12
come back @wdennis come back

wdennis
2017-02-07 20:12
I?m still joined & sharing...

greg
2017-02-07 20:12
ok

greg
2017-02-07 20:12
np

wdennis
2017-02-07 23:22
Ok, I fail... Trying to make a node in system inventory (I.e. has run sledgehammer) do a install into an existing deployment (which has my desired bootenv set.) Went into Nodes, then clicked on "Move Nodes", selected desired deployment, then clicked on "Redeploy Nodes". The node rebooted, went into sledgehammer, and all nodes roles went green, but there she sits... How to get it to take the desired bootenv change??

wdennis
2017-02-07 23:56
OK, here's what I figured out... - in Deployments, go to Matrix tab, click the Bind Node Roles icon, then click on "provisioner-os-install" and click the "+" key to do the binding. This will add the node role in blue (proposed state) beside the node. Do the same thing for the "rebar-installed-node" role. - Then click on the node role, and click the "floppy" icon on each role to commit the proposed role. Once this is done, DR will take the appropriate actions to conform the node to the role.

wdennis
2017-02-08 00:00
So what I don?t get is what a ?Deployment? is for? I would think it applies a collection of roles to a node, perhaps with per-deployment-specific node role attribute values? I could see the new (changed) ones sitting in ?proposed? state until someone commits them, but I don?t understand why the new node roles don?t get applied automatically to the nodes put into the deployment...

greg
2017-02-08 01:01
Oh - well - we need to talk aobut this more. This moves to training and operations with regard to what things mean. Can't right now though I'll try later.

2017-02-08 04:01
@wdennis this is not in an obvious place... there are some docs about this: http://digital-rebar.readthedocs.io/en/latest/api/common.html under 5.5.1.7.2.

wdennis
2017-02-08 04:29
@zehicle Is the ?rebar-installed-node? role a ?meta-role? (i.e. an aggregation of roles?)

wdennis
2017-02-08 04:31
Like in Ansible you can have a meta-role that includes ?n? other roles into a package; when you use the meta-role it applies the included roles in order defined...

wdennis
2017-02-08 04:33
I see that from those API docs that the example was binding the one ?rebar-installed-role? role to the node which was defined as ?a usefule set of node roles"

wdennis
2017-02-08 04:42
And I see that on http://digital-rebar.readthedocs.io/en/latest/api/common.html in section 5.5.1.7.2.2 that one has to (should?) create a ?deployment to deploy the nodes into?, but ? why? What exactly is a deployment supposed to be? A collection of nodes that were installed with the same roles/attributes? (I was thinking it was a defined collection of roles that would be applied to a node when a node was put into the deployment)

wdennis
2017-02-08 04:48
There should be a way to define a collection of roles that conforms the target machines to an intended environment, which includes firmware, OS, and systems software, that serves as a base over the group of machines

wdennis
2017-02-08 04:48
Then other roles can be layered on top on selected machines as desired (like the k8s stuff for instance)

greg
2017-02-08 05:10
@wdennis - we have some things to discuss. With regard to DigitalRebar's models and how they are employed to get you where you want to go.

greg
2017-02-08 05:11
First that objects: Deployments, Nodes, Roles, NodeRoles, DeploymentRoles and Attributes.

greg
2017-02-08 05:12
Deployments = bags of nodes. They just holding cells collect nodes. We usually ascribe purpose to a deployment. System deployment holding cell for discovered nodes. RSEnv1 - the nodes in RSEnv1.

greg
2017-02-08 05:13
Nodes = Things DR is operating on. VMs, machines, cloud instances, a linux instance for example.

greg
2017-02-08 05:14
Roles = Atomic functions that we want to apply to a node. These have dependencies and attribute requirements that are met to build a graph of actions to sequence against nodes.

greg
2017-02-08 05:14
NodeRoles = an instance of a Role applied to a Node within a deployment.

greg
2017-02-08 05:15
The matrix tab in the deployment is visualization of this. Nodes are rows, Roles are columns. Cells are NodeRoles.

greg
2017-02-08 05:15
DeploymentRoles are special node roles for the Roles within a deployment. MOre on this in a second.

greg
2017-02-08 05:16
Attributes - typed pieces of information that are stored on the objects.

greg
2017-02-08 05:17
Attributes can live on noderoles, nodes, deploymentroles.

greg
2017-02-08 05:19
When a machine PXE boots to sledgehammer, sledgehammer creates a node in DR and places that node in the system deployment. It also adds the noop role rebar-managed to the node. This role has dependencies that cause the initial 15-ish roles to be added to the node. Sledgehammer marks the node active after putting an SSH key in place.

greg
2017-02-08 05:20
When the system added roles to the node, it created noderoles to present that roles execution on the node. It also added deployment roles to the system deployment for those roles as well.

greg
2017-02-08 05:20
The annealer (the part that executes noderoles) sequences all the node roles and starts executing them in order.

greg
2017-02-08 05:21
To execute a node role, its attribute requirements must be met. All this means is that a role can require an attribute (like what os should I install), the annealer will try to find a value for that by checking things in order.

greg
2017-02-08 05:22
In order by node, noderole, deploymentrole, role. This way you can create defaults by node, by deployment, by global.

greg
2017-02-08 05:28
When you use the wizard to install the OS, you are really hiding about 5 steps. First, you are creating a deployment for tracking your nodes for this function set. Second, you are moving the nodes into the deployment for safe keeping, Third, you are adding noderoles and deploymentroles to the node and deployment. For the install OS wizard, this is done by adding the rebar-installed-node role (a noop) to the node. This also brings in the provisioner_os-install role through dependencies. Fourth, you are setting the target os attribute on the deployment_role to make it apply to all nodes in the deployment. I think. IT could be putting it on the node instead. I'd have to check. Either way, the target os is getting set. Fifth, the deployment and nodes are being committed causing the actions to take effect.

greg
2017-02-08 05:29
FYI, you can see the deployment roles in the matrix view. They are the column headers. Those links take you to the deployment roles attribute setting pages.

greg
2017-02-08 05:30
nodes in a deployment do NOT have to all have the same roles assigned to them.

greg
2017-02-08 05:31
For examples, a kubernetes deployment would have a set of etcd nodes, master nodes, and worker nodes. All of these have different node role assignments, but are all held within one deployment to represent the k8s instance.

greg
2017-02-08 05:32
You ansible analogy is not correct from a underlying tech perspective, but it does probably reflect what it looks like is going on from the outside and isn't a bad way to think about roles requiring other roles.

greg
2017-02-08 05:33
You have conflated action and configuration in your last part

greg
2017-02-08 05:33
roles are actions, attributes are the configuration.

greg
2017-02-08 05:35
So there are roles to do bios setting, firmware flashing, raid configuration, ipmi configuaration, OS installation, deploy kubernetes, ....

greg
2017-02-08 05:36
There are attributes that for the instances defined in a deployment control the configuration of that action for those set of nodes that have been declared to have that action done to them.

greg
2017-02-08 05:36
It turns out there is one more feature that makes more of what you want functional.

greg
2017-02-08 05:36
The Profile.

greg
2017-02-08 05:37
The Profile is a attribute/value list that overrides everything.

greg
2017-02-08 05:37
You can give a node profile and it will use those attribute values as configuration.

greg
2017-02-08 05:39
This way could build a profile that declared I want a raid5 volume and a raid10 volume and make the raid10 volume bootable, I want all the components at the latest levels, I want Ubuntu-16.04-ks, I want k8s verison 1.5.3, ....

greg
2017-02-08 05:40
The you apply all the roles to them. (Yes, we have a road map item that the profile should include a set of roles to apply to the node).

greg
2017-02-08 05:40
Add in some rule engine event handlers and all of this can be automated.

greg
2017-02-08 05:41
It sounds really complex, but it isn't too bad. Just takes a little to get you started with the right frame of reference.

greg
2017-02-08 05:41
That was a dump.

greg
2017-02-08 05:42
Got it? :slightly_smiling_face:

wdennis
2017-02-08 14:56
Epic - a lot to digest here...

wdennis
2017-02-08 14:57
"Digital Rebar - The Missing Manual" :)

greg
2017-02-08 15:34
I like to think it as the "DigitalRebar - An Epic Journey of a Thousand Poorly Worded Pages That You Can Not Find"

wdennis
2017-02-08 16:28
lulz

wdennis
2017-02-08 16:28
All things in time...

vlowther
2017-02-08 16:55
Yes, and the most important words are scattered about as comments in the source :slightly_smiling_face:

wdennis
2017-02-08 17:16
I created a glossary for myself; do these definitions make sense (or if not, how should I rewrite them?) * ?Annealer" - the system that executes NodeRoles * "Barclamp" - Collections (associations, "bags") of Roles, each having a "Jig" that implements them, along with Attributes and possibly a wizard * "Jig" - modular plug-in arch that abstracts configuration managers (i.e. Chef, shell, Ansible, ?noop?, etc) - "noop" jigs are milestones * "Sledgehammer" - PXE boot img used to do initial bare-metal Node discovery & inventory * "Node" - Object that DR operates on; Metal or VM system managed by DR * ?NodeRoles" - an instance of a Role applied to a Node within a Deployment * "Deployments" - Collections (associations, "bags") of Nodes. We usually ascribe purpose to a Deployment. * ?DeploymentRoles" - special NodeRoles for the Roles included in a Deployment. * "Workloads" - Barclamps with wizards * "Profiles" - lets you create a block of Attributes that can be applied to Nodes to override NodeRole settings * "Roles" - atomic functions (actions) that we want to apply to a Node; have dependencies and Attribute requirements that are used to build a graph of actions to sequence against Nodes; members of a Barclamp and are run by a Jig; can have Parent(s) and Children Roles * Attributes - typed pieces of information that are stored on the objects. Attributes can exist on NodeRoles, Nodes, and DeploymentRoles. They are used as configuration parameters by the Role.

vlowther
2017-02-08 17:25
Looks pretty good.

greg
2017-02-08 17:26
@wdennis - great job on decoding the rambings

wdennis
2017-02-08 19:32
Hey @greg - every time I hack on the Ubuntu kickstart template, and want to give it another try (the installer is currently failing with an error), it is enough to simply reboot the node, or do I have to do the ``` rebar nodes update 12 '{ "bootenv": "sledgehammer" }' rebar nodes update 12 '{ "bootenv": "ubuntu-16.04-ks-install" }? ``` dance?

greg
2017-02-08 19:35
You have to do the dance to rebuild the static files.

wdennis
2017-02-08 19:35

greg
2017-02-08 19:36
The files are not built upon request.

wdennis
2017-02-08 19:36
OK, thanks

2017-02-10 23:35
Hi! I've been watching crowbar for a while, and wondered if the current version would be suitable for running a small (16 node) bare metal lab? I'm basically looking to replace foreman. I'd need the ability to pxe boot, install various OSs - including 'odd' ones like CoreOS or fBSD, and have a nice webui for determining which nodes got (re-)installed with what OS

2017-02-11 00:41
@mech422 yes, that's exactly what we're talking about. CoreOS is an interesting one - totally possible except that that we need to discuss if SSH to it is OK or not.

2017-02-11 00:42
here's a video (there are several before this one) that shows the reprovisioning process https://www.youtube.com/watch?v=fFsaOUbmb9g&index=10&list=PLXPBeIrpXjfgurJuwVjZkcfmatCoXYM_v

2017-02-11 00:46
@zehicle ahh - thanks... fBSD is a big one for us as well. a lot of our production nodes are running it

2017-02-11 00:47
I'm just trying to slide rebar in the lab to get a lil PR for it :-)

2017-02-11 00:48
the current list includes debian-7, debian-8 and scientificlinux-6.8. I know that @VictorLowther on our team would know the delta for fBSD.

2017-02-11 00:48
generally, linux flavors are manageable

rstarmer
2017-02-11 00:55
has joined #json

wdennis
2017-02-14 04:23
So, a couple of questions to ruminate on...

wdennis
2017-02-14 04:24
1) How to add new nodes (or reinstall existing rebar-managed nodes) to an existing deployment?

wdennis
2017-02-14 04:26
2) If you want to install a node with a given OS (let?s say Ubuntu 16.04) but the only thing you want to change is the disk partitioning, what?s the best way to handle that? Separate bootenv?s with different templates, or the same bootenv with different templates?

wdennis
2017-02-14 04:37
3) Is there a way to compose templates where for instance you want to use the same basic overall master template, but insert sections inside that template which themselves could be templates? Again like for maintaining different standard disk partitioning layouts, multi-disk setups, default root passwords, etc.?

zehicle
2017-02-14 15:39
1) I assume you mean ones that are not in the database? You can use the rebar-join script from the node or outside it if you have SSH access. You can also use the API/CLI to create the nodes in advance so they are pre-identified when they show up. 2) @vlowther may have better idea. I think that can be handled by profiles where the attribs are different in each profile 3) that's another question @vlowther but I believe yes.

vlowther
2017-02-14 15:50
2) There are a couple of different ways to handle that. Quickest is to have a new bootenv and a new seed template. Most flexible is to abstract the partitioning stuff out into a per-node attrib, and expand that in the template.

sameroom
2017-02-14 15:50
Whoops! You've exceeded your daily message limit on this Sameroom account (it will reset in *24 hours and 0 minutes*). If you have too many Tubes for your budget, pause or delete some Tubes on https://sameroom.io/manage. If you dont have a subscription, visit https://sameroom.io/pricing to upgrade to unlimited messaging.

2017-02-14 15:51
2) There are a couple of different ways to handle that. Quickest is to have a new bootenv and a new seed template. Most flexible is to abstract the partitioning stuff out into a per-node attrib, and expand that in the template.

2017-02-14 15:52
3) Not as of yet, no. It is a good ask, tho -- the Go text/template language allows for such things, but I would need a reasonable way to expose that functionality.

2017-02-14 16:21
@mech422 I have not messed with freebsd in... a long time. The last time I tried playing around with any of the *bsds was approx. NetBSD 1.5.

2017-02-14 16:22
That said, if there is a way to do an unattended install the uses PXE and does not use NFS, we can probably handle it.

2017-02-16 05:24
I deployed rebar on an instance in AWS and then deployed k8s with rebar. How can I access the k8s dashboard ui? how can I long into the k8s linux instance? It takes in username ubuntu but asks for password.

greg
2017-02-16 14:30
https://IP of master node in k8s/ui should get you the dashboard.

greg
2017-02-16 14:31
You iwll need to have https open for that security group.

greg
2017-02-16 14:31
From the AWS digitalrebar node, you should be able to login into the other nodes as root.

greg
2017-02-16 14:32
@tnkumar - hopefully, that helps.

greg
2017-02-16 15:45
Test

2017-02-16 15:47
@mech422 - strange. How many nodes are you booting at once? What is this running on? I've seen something like this before. Our simple go-based tftp server wigs out in some environments that I don't understand.

2017-02-16 15:48
- connection between gitter and slack fixed - sorry about the delays

2017-02-16 15:48
gitter users are welcome to guest accounts on slack

2017-02-16 15:48
what I mean is what is the base OS of your admin node.

2017-02-16 15:49
thanks @zehicle

2017-02-16 15:51
to create Slacks, just use https://sameroom.io/SLrYards

2017-02-16 15:51
@zehicle I just tried iptables -F and rebooted the 'test server' - still got dhcp and timeout with tftp

2017-02-16 15:52
@zehicle I'm only booting 1 'test server' at a time atm - I have a total of 16 test nodes availabe

greg
2017-02-16 15:53
@mech422 - what is your base OS? I think I can give you a workaround.

2017-02-16 15:54
Ubuntu 16.04

greg
2017-02-16 15:56
convenient

greg
2017-02-16 15:56
Let's try this.

greg
2017-02-16 15:56
We are going to do two things.

greg
2017-02-16 15:56
First we are going to move the tftpport that that provisioner is using to something else.

greg
2017-02-16 15:56
To do this, we need to do the following:

greg
2017-02-16 15:57
cd digitalrebar/deploy/compose

greg
2017-02-16 15:57
vi compose.env

greg
2017-02-16 15:57
add to the end of the file.

greg
2017-02-16 15:57
TFTPPORT=6699

greg
2017-02-16 15:57
save the file

2017-02-16 15:57
end of file ? I that file doesn't exist for me ?

greg
2017-02-16 15:57
oops - common.env

2017-02-16 15:58
ok - added

greg
2017-02-16 15:58
docker-compose restart provisioner

greg
2017-02-16 15:59
once that is done.

greg
2017-02-16 15:59
you should be able to do:

greg
2017-02-16 15:59
ps auxww | grep provisioner-mgmt

2017-02-16 15:59
rebar@bernie:~/digitalrebar/deploy/compose$ sudo netstat -tulpn | grep 69 tcp6 0 0 :::443 :::_ LISTEN 18699/docker-proxy udp6 0 0 :::69 :::_ 25070/provisioner-m

greg
2017-02-16 15:59
it should show look like this:

greg
2017-02-16 15:59
root 137266 0.0 0.0 1506036 29616 ? Sl Feb12 4:50 provisioner-mgmt --api-port 8092 --static-ip 136.179.33.28 --static-port 8091 --tftp-port 6699 --file-root /tftpboot

2017-02-16 16:00
still says 69....

2017-02-16 16:00
let me double check I saved the file

greg
2017-02-16 16:00
yeah - thinking .

greg
2017-02-16 16:00
okay - try this.

greg
2017-02-16 16:00
docker-compose stop provisioner ; docker-compose rm -f provisioner ; docker-compose start provisioner

2017-02-16 16:03
hmm - it keeps saying 'ERROR: no containers to start'

greg
2017-02-16 16:04
what directory are you in?

2017-02-16 16:04
dr/deploy/compose

greg
2017-02-16 16:04
flu induced haze.

greg
2017-02-16 16:04
docker-compose up -d provisioner

2017-02-16 16:05
ahh - rebuilding the image

greg
2017-02-16 16:05
you need to run another tftp server

2017-02-16 16:06
'another' as in tftpd-hpa ? or another dr container ?

2017-02-16 16:07
ohhh... that docker-compose up bombed

2017-02-16 16:07
'Image digitalrebar/dr_provisioner not found'

greg
2017-02-16 16:07
export DR_TAG=latest

greg
2017-02-16 16:07
yes tftpd-hpa

2017-02-16 16:08
this server is my foreman server too - so I have tftpd-hpa, isc-dhcpd, and bind9 already installed an working...I can restart them if needed

greg
2017-02-16 16:08
export DR_TAG=master

greg
2017-02-16 16:09
Let's get the provisioner running again.

greg
2017-02-16 16:09
IT sounds like you know how to run tftpd-hpa already.

greg
2017-02-16 16:10
I need you to serve the .cache/digitalrebar/tftpboot directory.

greg
2017-02-16 16:10
I use this ``` foundry@master-admin:~/digitalrebar/deploy/compose$ cat /etc/default/tftpd-hpa # /etc/default/tftpd-hpa TFTP_USERNAME="tftp" TFTP_DIRECTORY="/home/foundry/.cache/digitalrebar/tftpboot" TFTP_ADDRESS="[::]:69" TFTP_OPTIONS="--secure" ```

2017-02-16 16:10
ok - provisioner is back up

2017-02-16 16:10
rebar@bernie:~/digitalrebar/deploy/compose$ sudo netstat -tulpn | grep 69 tcp6 0 0 :::443 :::_ LISTEN 18699/docker-proxy udp6 0 0 :::6699 :::_ 10089/provisioner-m

greg
2017-02-16 16:10
Cool

2017-02-16 16:10
lemme restart tftpd-hpa real quick

greg
2017-02-16 16:11
make sure to change it base directory to the digitialrebar place.

2017-02-16 16:13
ok - dir changed, perms checked and service restarted

2017-02-16 16:13
shall I go restart the test node ?

greg
2017-02-16 16:13
yes

2017-02-16 16:17
odd - that gave a timeout too... I think I need to change the ip on tftpd-hpa - I had it pinned to the pxe interface, but I think DR is handing it the mgmt ip

greg
2017-02-16 16:17
probably

2017-02-16 16:18
ok - its listening on 0.0.0.0:69 now - lemme go reboot

greg
2017-02-16 16:18
we use the admin-ip flag

2017-02-16 16:21
yeah - I used that to specify the 'web ip' as I wasn't sure what was what...

2017-02-16 16:21
my 'mgmt' network and 'pxe' network are diff

2017-02-16 16:21
ok - so that got me pxelinux

greg
2017-02-16 16:21
okay - should boot into sledgehammer

greg
2017-02-16 16:21
a centos 7 ram image.

2017-02-16 16:21
but it seems to be stuck

greg
2017-02-16 16:21
The node should appear in the UX.

greg
2017-02-16 16:21
okay - sooooo

greg
2017-02-16 16:22
with regard to networking

greg
2017-02-16 16:22
is the pxe network routable to the mgmt network and vice versa.

2017-02-16 16:22
hmm - tftpd: read: Connection refused ?

2017-02-16 16:23
its sorta like a firewall setup...I have diff interfaces on the box - 1 for the 'pxe network' , 1 for the 'ipmi network' and 1 for the 'mgmt' network

2017-02-16 16:23
the rest of my lab gear can only hit the 'mgmt' network directly, so I need the web interface on that

2017-02-16 16:24
I have ipv4.ip_forward on, so I shouldn't need any routes, right ?

greg
2017-02-16 16:24
did you configure networks?

greg
2017-02-16 16:24
in digitalrebar?

2017-02-16 16:24
Umm - I tried too... I never got a 'the_bmc' network...

2017-02-16 16:25
but I added a 'pxe network' - and setup dhcp on it - it appears to be working, since we got dhcp boot

2017-02-16 16:25
last time I re-ran the install, I lost the 'admin-internal' network, though I don't know if thats used for anything

greg
2017-02-16 16:25
example more than anything else.

greg
2017-02-16 16:26
did you add a router in the admin network?

2017-02-16 16:26
tftpd-hpa is listening on all IPs now, so even if ips are baked into the DR boot stuff, it should respond....

2017-02-16 16:27
no routers added anywhere

greg
2017-02-16 16:27
okay - so my guess is that you gave pxe network address to the node, it tried to tftp to a mgmt-network ip.

greg
2017-02-16 16:27
it doesn't have a route for that.

greg
2017-02-16 16:28
Two ways to fix this.

2017-02-16 16:29
it shouldn't need a route? as both interfaces are on the same box ?

greg
2017-02-16 16:29
one is to add the admin node's PXE IP as the router for the network in the network page.

2017-02-16 16:29
ok - I can do that

greg
2017-02-16 16:29
it doesn't know where to send it though.

2017-02-16 16:30
I figured that 'linux responds to arps for all ips' thing would handle that - let me add the router and restart it :-)

greg
2017-02-16 16:30
the other is to change rerun the startup script with the PXE ip as the admin ip. This will clean things up you can then add your network back. We listen to 0.0.0.0 for ui and api functions.

2017-02-16 16:31
ok - if this works, I'll re-install like that

2017-02-16 16:31
brb

greg
2017-02-16 16:31
i"m going offline for a while. Need to nap. Later

2017-02-16 16:32
thanks :-)

2017-02-16 16:32
sledgehammer is loading :-)

2017-02-16 18:04
@zehicle Wierd - I stopped everything, purged everything, pulled fresh from git again, and ran: ./run-in-system.sh --help --con-provisioner --con-dhcp --account=rebar --access=HOST --admin-ip=192.168.51.70/24 --deploy-admin=local

2017-02-16 18:05
admin-internal still came up with the wrong ip ranges - but once I changed it to do dhcp for 192.168.51.0/24 - dhcp AND TFTPD work ?!?

greg
2017-02-16 18:15
yeah - we think there are some nat docker tftp issues when not all the ips align.

2017-02-16 18:18
oh - wb :-) How was your nap ?

2017-02-16 18:20
I think I only have 2 more things to fix before I can start provisioning: the 'discovery' stuff and the BMC/IPMI stuff

2017-02-16 18:20
I can see it booting discovery/vmlinuz - and it comes up to a centos login... but I'm not seeing any 'discovered' nodes in the webui

zehicle
2017-02-16 19:15
@mech422 that could mean that the booted system cannot connect back to the admin server to register

zehicle
2017-02-16 19:15
it would do it via port 443

2017-02-16 19:24
rebar@bernie:~/digitalrebar/deploy/compose$ sudo netstat -tulpn | grep 443 tcp6 0 0 :::443 :::* LISTEN 31737/docker-proxy

2017-02-16 19:25
its only listening on ipv6 - but tftp was the same, and recognized ipv4 sooo...

2017-02-16 19:25
I added the router on the network

2017-02-16 19:26
oh - should the router have a /24 or /32 netmask ? the field auto-fills with a /32 on the end - but the admin-internal network creates a router with a /24 ?

greg
2017-02-16 19:36
24

2017-02-16 19:39
does the discovery image have a login ?

greg
2017-02-16 19:40
root/rebar1

greg
2017-02-16 19:40
journalctl -u sledgehammer

greg
2017-02-16 19:40
often has useful info

2017-02-16 19:43
dam, I was close - I tried rebar/rebar1 :-P

2017-02-16 19:44
I think its a dns failure - its setting domain name to local.neode then dying with an unknown hostname a lil bit further down

2017-02-16 19:45
doesn't appear it got as far as trying to talk to the admin node

2017-02-16 19:57
'Unable to connect to Rebar: unable to verify existance of machine-install user: Get https://192.168.40.70/api/v2/machine-install'

2017-02-16 19:58
looks like its banging it by IP

2017-02-16 20:05
Hmm..thats wierd - none of the interfaces have an ipv4 address - eth0 just has an ipv6

2017-02-16 20:11
dhclient says it got a (correct) dhcp address and bound it to eth0

2017-02-16 20:12
but ip address show eth0 only has an ipv6

2017-02-16 20:22
re-ran dhclient and got eth0 an ipv4 address, but its still not happy - its dying around line 105 of /tmp/startup.sh

2017-02-16 20:23
(I can't manually create nodes thru the webui either, if that tells you anything ?)

zehicle
2017-02-16 20:51
the webui only can create with providers

zehicle
2017-02-16 20:51
you can use the CLI to inject nodes

zehicle
2017-02-16 20:52
docker logs -f compose_rebar_api_1 may show you errors if it's related to the node create API

2017-02-17 15:07
Hi, I've been following the tutorial locally, but would like to test it out with packet.net however is the rackn100 code still vaild?

greg
2017-02-17 15:45
@grealish - I thought so, but @zehicle would know better.

greg
2017-02-17 15:45
@mech422 - Thinking about your issue. Can you send me a snippet with the journalctl -u sledgehammer

greg
2017-02-17 15:55
@mech422 - I sometimes see issues like this with dueling dhcp servers. We get an initial load image, but then get a different dhcp response in sledgehammer and it is missing some parameters. Maybe?

2017-02-17 16:34
@galthaus Umm - I'd love - can you give me a couple of hours ? Slammed with the Friday morning meetings atm

greg
2017-02-17 17:24
I'm on and off. The whole flu thing is maybe done, but pretty fatigued so I come and go.

2017-02-17 20:22
@grealish yes! but I think it's good for $35


2017-02-17 20:27
connecting this to #digitalrebar on freenode




zehicle
2017-02-17 21:10
testing IRC connection


zehicle
2017-02-17 21:14
hello IRC users?!

2017-02-17 21:14
yes, IRC connection is working.

2017-02-17 22:05
sweet

2017-02-18 15:48
Morning

greg
2017-02-18 15:52
hi

2017-02-18 15:52
@zehicle Have you seen https://tumblr.github.io/collins/index.html ?

greg
2017-02-18 15:53
Well this is @greg

2017-02-18 15:53
oh sorry! Morning :-)

greg
2017-02-18 15:54
np. I suspect that all the cross connecting of apps will get things confused, but we'll see. :slightly_smiling_face:

2017-02-18 15:55
heh - yeah, we're sort of spoiled for choices lately - irc, slack,gitter, etc etc

2017-02-18 15:57
I've gotta figure out a cloud-init script to boot our 'standard' company golden images on openstack with ldap and everything working...

2017-02-18 15:57
hopefully, I should be done with that this weekend, and can get back into DR begginning of the week

greg
2017-02-18 15:58
okay - cool - good luck with that.

2017-02-18 15:59
thanks - regarding the sledgehammer logs - I'm 90+% certain theres only 1 dhcp server on the vlan...I ran dhclient manually from sledgehammer, and wasn't surprised by the ip it said responded

greg
2017-02-18 16:00
ok - the next step will be things like cat /proc/cmdline and some others to make sure all the options got sent with the right values.

2017-02-18 16:01
yeah - just re-ran dhclient and its the right dhcp server ip thats responding - ok, I can check the options real quick if ya like - i just ran it again

greg
2017-02-18 16:02
From the sledgehammer root login, cat /proc/cmdline

greg
2017-02-18 16:02
that should have sent some stuff.

greg
2017-02-18 16:03
cat /var/lib/dhclient/dhclient.leases

greg
2017-02-18 16:03
That should show the DHCP options.

greg
2017-02-18 16:04
Those are the inputs for the script.

2017-02-18 16:05
dhclient has multiple entries since I re-ran it multiple times - its the last entry in the file for current run, right?

greg
2017-02-18 16:05
it will probably try and match them all.

2017-02-18 16:06
blah - sledgehammer doesn't allow root ssh - lemme go create a user



2017-02-18 16:11
hmm - I could blow away the leases file and re-run dhclient?

greg
2017-02-18 16:14
ah = yes

2017-02-18 16:14
the last lease entry got cut off somehow - but it was basically the same as the one above it - option dhcp-rebinding-time, domain name, rebind, renew nad expire times

greg
2017-02-18 16:14
on the adin nod

greg
2017-02-18 16:14
on the admin node, cd ~/.cache/digitalrebar/tftpboot

greg
2017-02-18 16:14
you addresses are 192......

greg
2017-02-18 16:15
ls C0*


greg
2017-02-18 16:16
remove all the 192.* C0\* files

2017-02-18 16:17
<root@bernie>:/home/rebar/.cache/digitalrebar/tftpboot# rm 192.168.51.200.ipxe C0A833C8.conf

greg
2017-02-18 16:17
yes and then there is a C0 file in pxelinux.cfg directory

2017-02-18 16:17
ok - removed

greg
2017-02-18 16:19
reboot the node

greg
2017-02-18 16:19
the not the admin node

2017-02-18 16:19
client/test node rebooting

2017-02-18 16:22
Hey! we have a node :-)

2017-02-18 16:23
and its in error state now :-)

2017-02-18 16:24
looking at the node - all steps up to 'rebar-managed-node' are green

2017-02-18 16:24
'rebar-managed-node' is yellow but 'amt-discover' is green

2017-02-18 16:25
bios-discover is yellow (HP Proliant G5 blades)

2017-02-18 16:25
ipmi-discover is yellow

2017-02-18 16:25
raid-tools-install is red

2017-02-18 16:25
raid-discover is yellow

2017-02-18 16:27
oh wow - gitter inlines gists ?

greg
2017-02-18 16:33
You need to read some more. :slightly_smiling_face:

greg
2017-02-18 16:33
The hardware tools need some downloads that we can't provide because of licensing.

greg
2017-02-18 16:34
yellow means pending - red means error - green means done.

greg
2017-02-18 16:34
raid-tools-install is the predecessor of most of the discovers.

2017-02-18 16:35
cool - I'll see if I can fuxor it around till its happy

greg
2017-02-18 16:35
``` Gregs-MacBook-Pro-2:provisioner galthaus$ ls ~/.cache/digitalrebar/tftpboot/files/raid/ 8.07.14_MegaCLI.zip SAS2IRCU_P19.zip SAS2IRCU_P20.zip ```

greg
2017-02-18 16:36
MegaCLI and P20. There are links in the docs. Also if you click on the red icon, it may give you the link to the file to download.

2017-02-18 16:36
Can I just reboot the node after I get the firmware blobs ?

greg
2017-02-18 16:36
You can, or you can click the retry button on the red icon. If you get to the annealer, upper right button/icon in ux.

greg
2017-02-18 16:37
the errors are separate and there is a retry all button there.

2017-02-18 16:37
(bios update / configuration is a big one for me - we have 10k-15k metal nodes in 80 pops)

2017-02-18 16:37
sweet! thanks for the help!

greg
2017-02-18 16:37
okay - that is cool and good for us to hear. We may want to have a conversation at some point about support and hardware types.

2017-02-18 16:38
sure - I'd be happy to provide any info I can

2017-02-18 16:38
we're mostly a supermicro shop (looks like we'll be transitioning to dell in the future)

greg
2017-02-18 16:38
At that scale and size, you may want some consulting and contractual support.

greg
2017-02-18 16:39
We have the beginning of supermicro support and great dell support.

2017-02-18 16:39
Time to feed the :bear:!

2017-02-18 16:39
yeah - thats gonna be a hard sell, but we DO employ 2-3 full time fBSD core committers on staff

2017-02-18 16:39
so might be able to work something out that way - we do give back to stuff we use

greg
2017-02-18 16:40
ok

2017-02-18 16:41
anyway, I'll be happy to provide any info I can

greg
2017-02-18 16:42
cool

2017-02-18 16:42
once we get going, maybe I can get my boss talking to you guys

greg
2017-02-18 16:42
yeah - that is fine.

2017-02-18 16:44
@mech422

2017-02-19 20:25
@galthaus Just a heads up - after following: http://digital-rebar.readthedocs.io/en/latest/deployment/install/raid-bios.html the 'raid tools install' step is still red, and 'bios-discover' is still yellow

greg
2017-02-19 20:25
It has a log in the node role. It will probably tell you what is wrong.

2017-02-19 20:26
@galthaus Also, am I understanding correctly that rebar needs to control/configure the BMCs ? my IPMI/BMC network is a seperate DHCP network, currently not under rebar control

greg
2017-02-19 20:27
It won't since you don't have one configured.

greg
2017-02-19 20:27
It will look at your current settings though.

greg
2017-02-19 20:27
I think.

2017-02-19 20:27
@galthaus err... you mean a log on the node? or a log named after the node on the admin node ?

greg
2017-02-19 20:27
In the UI, you can click into the red icon and see the log of what went wrong.

2017-02-19 20:29
hmm - looks like the rpm version got bumped: caution: filename not matched: Linux/MegaCli-8.07.14-1.noarch.rpm

2017-02-19 20:29
lemme unpack the zip and see whats in there

greg
2017-02-19 20:29
hmm _ thought I fixed this potentially.

2017-02-19 20:31
oh - the download name was different too - it downloaded as 'Linux_MegaCLI_8.07.07.zip' so I symlinked it to '8.07.14_MegaCLI.zip'

2017-02-19 20:31
yeah - it extracts to 'MegaCli-8.07.07-1.noarch.rpm'

2017-02-19 20:34
btw - do I also need to download the dell/supermicro/etc bios tools ?

greg
2017-02-19 20:34
you will need the sum tool, I believe.

2017-02-19 20:45
grrr...this boils down to broadcom's site sucks, I think :-P the search doesnt seem to find 8.07.14 and some results claim MegaCLi 5.5 P2 is 'latest'

greg
2017-02-19 20:45
yes - I had the exact link earlier.

greg
2017-02-19 20:47
I use this

greg
2017-02-19 20:47

greg
2017-02-19 20:48
Then rename it to: 8.07.14_MegaCLI.zip in files/raid under the tftpboot directory.


greg
2017-02-19 20:49
is the other one.

greg
2017-02-19 20:50
In fact, updating barclamp info shortly.

2017-02-19 20:51
thanks - rebooting client node

2017-02-19 20:56
Woot! Node shows all green upto 'firmware flash'

greg
2017-02-19 20:56
cool

2017-02-19 20:57
'firmware-flash' 'ipmi-configure' and 'rebar-hardware-configured' are all blue

2017-02-19 20:57
guess I need the bios flash tools now ?

greg
2017-02-19 20:59
no

greg
2017-02-19 20:59
They are waiting to run.

greg
2017-02-19 20:59
It indicates that they are awaiting config.

greg
2017-02-19 21:00
You can commit those and they will run with their current config or you can make changes.

greg
2017-02-19 21:00
At this point, you usually add a workload (os install, or k8s or whatever) and when you commit, it pushes the node through the rest of the process.

2017-02-19 21:01
ahh - lets see if I can do an OS install...

greg
2017-02-19 21:01
firmware-flash may not do aything if nothing matches. ipmi-configure will set the root password to cr0wBar! and configure an IP if there is a BMC network.

2017-02-19 21:03
Hmm - I like IPMI - saves me having to walk around rebooting stuff - I'll probably have to redo my ipmi network to play nice with rebar though

2017-02-19 21:18
well, that was painless :-)

2017-02-19 21:18
both foreman and rebar end up with unbootable systems on the HP nodes though

2017-02-19 21:19
something funky with the bios/grub I think - oddly enough, it works fine if you install manually via the installer

2017-02-19 21:21
Firing up one of the dell nodes now

greg
2017-02-19 21:27
What OS? Probably a tweak for the kickstart. We would like to know more about that.

greg
2017-02-19 21:27
Batman movie time

2017-02-19 21:27
ohhh - sounds good - I'm waiting for Dr. Strange to hit Vudu

2017-02-19 21:28
its ubuntu 16.04 (for some reason, thats the only OS option I get - could have sworn the Centos iso was there too)

2017-02-19 21:28
I'll try and track down the preseed issue - its gotta be some setting in there somewhere

2017-02-19 21:37
Hmm - Dell C6100 cloud server nodes not booting after install either

2017-02-19 21:37
starting to think ubuntu 16.04 does a crap job with grub :-P

2017-02-19 22:29
I think I FOUND IT!

2017-02-19 22:29
the hp machine was installed with no partition flagged as bootable

2017-02-19 22:30
I marked /boot as bootable and it seems to be fine

2017-02-19 22:32
gonna check the Dell node now

2017-02-19 23:06
err - whats the default login for machines provisined as Ubuntu 16.04 ? I've tried 'rebar/rebar', 'rebar/rebar!', 'rebar/cr0wBar!', 'root/rebar' , 'root/rebar!', 'root/Cr0wBar!'....

greg
2017-02-19 23:52
rebar/rebar1

greg
2017-02-19 23:53
If we configure raid, then we mark the drive bootable.

2017-02-20 00:09
yeah - I tried rebar/rebar1 - but I'm rebooting and I'll give it another try

2017-02-20 00:09
I couldn't figure out which preseed is actually being used - theres 4 or 5 of them in the tftpd ubuntu/preseed dir

greg
2017-02-20 00:10
regardless. From the admin node, root passwordlesss should work.

greg
2017-02-20 00:10
if that doesn't work, then add your ssh key to the rebar-access data attribute and rerun the role.

greg
2017-02-20 00:10
That willl allow you root access.

2017-02-20 00:11
I was trying to figure out how you set attributes....

greg
2017-02-20 00:11
propose the deployment the node is in.

greg
2017-02-20 00:11
Edit the attribute.

2017-02-20 00:11
ahh - ok

greg
2017-02-20 00:11
Commit the deployment.

2017-02-20 00:13
yeah - no bootable partition on the dell node either

2017-02-20 00:13
not sure why it even booted ?

2017-02-20 00:13
and it keeps powering off - hope its not a hardware problem

2017-02-20 00:15
gitter to IRC bridge test....

2017-02-20 00:15
damn - it's not a three way thing

2017-02-20 00:16
IRC to gitter

greg
2017-02-20 00:16
Oh - we power things off.

2017-02-20 00:16
gitter to irc

2017-02-20 00:16
nm-> it's working. user error

greg
2017-02-20 00:16
@mech422 - DR powers things off by default.

greg
2017-02-20 00:16
if IPMI is configured and usable, DR will power off nodes that are "bored".

greg
2017-02-20 00:17
You can change that with stay_on attribute on the node. It defaults to false. Set it to true and it will keep the node on.

2017-02-20 00:25
ok - so the boot issues appear to just be the /boot partition not being marked bootable in the MBR

2017-02-20 00:25
I'm trying to re-deploy now after having changed the attributes

2017-02-20 00:26
it wasn't happy with 'redeploy all nodes' after commiting the deployment, so I'm powering the node back up before trying it again

2017-02-20 00:27
hmm - still an error

2017-02-20 00:33
and it turned the node off again

2017-02-20 00:33
it was about half way thru the 'node roles' thing - I don't think I can 'redeploy all' until the node is 'green' again, right ?

2017-02-20 00:37
dinner time!

greg
2017-02-20 01:16
you redeploy before. iT should not have turned the node off until all green. You may need to refresh the UI to get the latest data. Also powering off the node marks it as not alive and not all green as a consequence.

2017-02-20 15:37
@mech422 it's on my radar and I had a discussion with a past user. it looks like an internal project that was opened. Good validation of the overall type of workflow that we're building as generic cross function.

rstarmer
2017-02-21 19:12
are there instructions for deploying DR on a mac directly? I thought I?d seen them before, but I can?t seem to find quite the right incantation.

greg
2017-02-21 19:53
there are and there aren't/

greg
2017-02-21 19:53
Docker made it much harder.

greg
2017-02-21 19:54
Their latest attempt to run docker on Macs does NOT work well with what we need to do in the environment for networking. So, you need the boot2docker based methodology that is harder to find now.

rstarmer
2017-02-21 21:01
Ok, I can just deploy via VM then!

zehicle
2017-02-24 16:15
we made changes to the ansiblie install lastnight - there may be issues, we are investigating

zehicle
2017-02-24 16:15
we = I

2017-02-28 00:00
is there an ansible playbook for deploying DR? I see mention of ansible deployment here ? http://digital-rebar.readthedocs.io/en/latest/deployment/install/ansible.html but the repo it refers to is nonexistent/private, and it mentions ubuntu 14 target system instead of 16 so I'm assuming the playbook was made private for not being up to date

zehicle
2017-02-28 03:41
Yes, in /deploy/digitalrebar.yml


zehicle
2017-02-28 03:43
run-in-system basically stages the Ansible run

zehicle
2017-02-28 03:44
I'll review that page and update it. We've been focused on the run-in-system install

zehicle
2017-02-28 03:53
@Iae - you are right, that page was out of date! Thanks for alerting me. I'm fixing it

zehicle
2017-02-28 04:01
wow - that page had a lot of old stuff. I've pruned it.

2017-02-28 04:54
@lae I've updated the page - try http://digital-rebar.readthedocs.io/en/latest/deployment/install/README.html as a source. We generally recommend using the quickstart script for first setup

2017-02-28 04:54
http://digital-rebar.readthedocs.io/en/latest/deployment/install/quick.html#quick-start

2017-02-28 17:05
Hm, I was hoping to be able to just add a role and set some variables to our infrastructure playbook

zehicle
2017-02-28 17:34
Depending on complexity, real docs on that. We do it by example and code checks.

zehicle
2017-02-28 17:34
On the list of things to document...

2017-02-28 20:21
Hello, I'm trying to install DR on metal with DNS and DHCP. I'd like to use a self-signed cert. I'm running the quckstart with --validate_certs=False flag, but it's still attempting to validate. Any ideas? Thanks in advance!

greg
2017-02-28 20:26
Hi @richie9352 - what is still attempting to validate certs?

greg
2017-02-28 20:28
validate_certs is an ansible flag associated with the get_url task. We don't have that any where?

greg
2017-02-28 20:28
What command are you trying to run?

2017-02-28 20:28
Hi @zehicle - The quckstart script. Failed to validate the SSL cert for github-cloud.s3.amaonaws.com:443. I'm assuming this is because I'm running on metal, not aws.

greg
2017-02-28 20:28
hmm - okay - checking

2017-02-28 20:30
Yes, I couldn't find any reference to this flag in the docs.

zehicle
2017-02-28 20:31
@richie9352 is the quickstart.sh not downloading or are you getting the script to start?

greg
2017-02-28 20:31
oh - S3 at Amazon is down starting at about 11:00a CST.

greg
2017-02-28 20:32
That would cause this error.

zehicle
2017-02-28 20:32
!!

2017-02-28 20:32
Yes the script is starting.

2017-02-28 20:32
Ahh! That would cause it

greg
2017-02-28 20:32
Looks like it is still down.

zehicle
2017-02-28 20:33
ouch - yes, worldwide impact

greg
2017-02-28 20:33
soooo - wait for that to get fixed. :slightly_smiling_face:

2017-02-28 20:33
Wow, that's not good


wdennis
2017-02-28 20:33
The cloud will make everything better & easier, they said... ;)

2017-02-28 20:33
Well, At least I know it wasn't my install to blaim :)

greg
2017-02-28 20:33
Or my code :slightly_smiling_face:

2017-02-28 20:33
Haha! True

zehicle
2017-02-28 20:34
maybe I should plug the cloud back in...

wdennis
2017-02-28 20:34
Hybrid cloud ftw ;)

2017-02-28 20:35
obligatory relevant xkcd: https://xkcd.com/908/

2017-02-28 20:35
Thanks for the help guys!

greg
2017-02-28 20:36
:slightly_smiling_face:

2017-02-28 21:04
thanks amazon

2017-03-03 15:07
Hi

2017-03-03 15:08
How can we have access to Kubernetes Config? Deployed Kubernetes in Amazon

greg
2017-03-03 15:12
You can edit the role attributes on k8s-config (it is in the upper corner of the deployment view). Changing values and committing the deployment will make the attributes take effect.

greg
2017-03-20 17:49
Hi All - I've update to latest kargo and let most of our defaults that same. The big change is that the default api server port has moved from 443 to 6443.

greg
2017-03-20 17:50
So you will need to add :6443 to your url to access the api server.

2017-03-20 18:02
Hello! Can anyone please tell me how to add additional arguments to openstack cloud provider? I'm using v3 api, and I need to add os-user-domain-name and os-identity-api-version, but there are no such fields in provider interface. Thanks in advance!

greg
2017-03-20 18:06
hmm - checking

greg
2017-03-20 18:23
@alex3594 - My guess is that we don't currently handle that.

greg
2017-03-20 18:24
We would need to update the cloudwrap container with a version of the openstack command that supports v3 if it doesn't already, and then add those two fields to the passed through structure and put them on the openstack command.

greg
2017-03-20 18:24
Can you open an issue for this, please?

2017-03-20 18:25
@zehicle Sure, thanks!

2017-03-20 18:30
I created the issue, here is the link: https://github.com/digitalrebar/digitalrebar/issues/237 Thanks!

greg
2017-03-20 18:31
Thanks

zehicle
2017-03-20 18:33
Here's the parameter mapping for the openstack CLI in cloudwrap

zehicle
2017-03-20 18:33
openstack --os-username \'#{endpoint['os-username']}\' " \ "--os-password \'#{endpoint['os-password']}\' " \ "--os-project-name \'#{endpoint['os-project-name']}\' " \ "--os-region-name \'#{endpoint['os-region-name']}\' " \ "--os-auth-url \'#{endpoint['os-auth-url']}\' " \ "#{cmd}"

zehicle
2017-03-20 18:34
what fields do you require for your regular openstack CLI calls?

greg
2017-03-20 18:34
@zehicle - he opened a bug for it with the fields and all.

2017-03-20 18:34
--os-user-domain-name 'default' and --os-identity-api-version '3'

zehicle
2017-03-20 18:35
ah, thanks

zehicle
2017-03-21 13:12
I've got a patch out there for that --os issue

greg
2017-03-21 13:19
I'll attempt to pull it in and rebuild the container.

greg
2017-03-21 14:49
alex3594 - Containers and code are updated.

greg
2017-03-21 14:49
You can do a git pull in the digitalrebar directory.

greg
2017-03-21 14:50
You will need to do the following commands from the deploy/compose directory

greg
2017-03-21 14:50
docker-compose restart rebar-api

greg
2017-03-21 14:50
docker-compose stop cloudwrap ; docker-compose rm -f cloudwrap ; docker-compose up -d cloudwrap

greg
2017-03-21 14:50
or just start over.

2017-03-21 16:23
Thank you! Will test it now!

zehicle
2017-03-21 16:59
You should watch the cloudwrap log to make sure parameters are right

2017-03-21 20:04
hey guys

2017-03-21 20:05
I am trying to setup a digitalrebar admin server

2017-03-21 20:05
and I keep getting an error in this ansible task : TASK [Pull compose images [SLOW]]

2017-03-21 20:06
the output basically says that the images are already up-to-date

2017-03-21 20:07
https://gist.github.com/dadicool/e1ee0ab50bcec157dedf0ac4e3f18a57

2017-03-21 20:07
I tried to rerunning multiple times as suggested here : http://digital-rebar.readthedocs.io/en/latest/deployment/troubleshooting/specific-environment/Run-In-System.html

2017-03-21 20:07
but I keep hitting the same error ...

2017-03-21 20:08
any suggestions would be welcome!

greg
2017-03-21 20:13
it is me.

greg
2017-03-21 20:13
I built a bad cloudwrap again.

2017-03-21 20:14
:)

2017-03-21 20:14
any reason why the script tries to pull with DR_TAG=master

2017-03-21 20:14
that feels ... adventurous

greg
2017-03-21 20:14
Because your git tree is master. We only currently have a functioning master. Our bad probably.

2017-03-21 20:15
is there anything I could do at this moment to get me a functional setup?

greg
2017-03-21 20:16
no - I need to repush an image. I'd love for people to yell at docker for changing how docker works on a mac, but ...

2017-03-21 20:16
got it

2017-03-21 20:16
is this a matter of minutes, hours, days?

greg
2017-03-21 20:16
And I need time to generate a good branch strategy.

greg
2017-03-21 20:16
hopefully, minutes to 1 hour.

2017-03-21 20:16
got it

2017-03-21 20:17
drop a message here when your push is good to go.

greg
2017-03-21 20:17
We have the tools ,but haven't spent the time to push cut releases.

2017-03-21 20:17
and thanks!

2017-03-21 20:17
is this something we can help with in the open or is this behind the firewall?

greg
2017-03-21 20:18
All open. In theory, you could build your own image. It is in the containers tree of the source code.

greg
2017-03-21 20:18
./rebuild-containers --pull --force cloudwrap

2017-03-21 20:19
in which folder is that script?

2017-03-21 20:19
I don't see it in the checkout

greg
2017-03-21 20:19
containers

greg
2017-03-21 20:19
digitalrebar/containers

2017-03-21 20:20
I need a working GO environment for rebuilding the containers? fun times :)

greg
2017-03-21 20:20
sorry.

greg
2017-03-21 20:20
you need a sws command.

2017-03-21 20:21
since I don't have my GO env setup on this machine, I will wait for your signal that there is a new cloudwrap image

2017-03-21 20:22
and for reference, I am on ubuntu 16.04LTS with latest docker

2017-03-21 20:22
ubuntu@ip-172-30-0-51:~/digitalrebar/containers$ docker --version Docker version 17.03.0-ce, build 60ccb22

greg
2017-03-21 20:22
ooo - shiny

2017-03-21 20:22
thanks a lot for the help - hoping for another drop later tonight (in CET here)

greg
2017-03-21 20:22
docker 1.12.5 is what I use on mac.

greg
2017-03-21 20:55
@dadicool - you should be good now.

2017-03-21 21:40
I can confirm the new image looks good - hoping to get my hands on my first digital.rebar environment shortly :)

2017-03-21 21:41
thanks @galthaus

greg
2017-03-21 21:41
awesome! Sorry about the break. We try to keep master runnable always.

greg
2017-03-21 21:42
At some point, it would be cool to hear about what you are doing with DR. publicly or privately.

2017-03-21 21:53
I am happy to share why digital.rebar caught my attention

2017-03-21 21:53
basically, we're a startup operating in the healthcare space in europe

2017-03-21 21:54
and there pretty specific regulations that prevent us from hosting on public clouds

2017-03-21 21:54
we want to benefit from the latest innovation in infrastructure automation (looking really closely at K8S)

2017-03-21 21:55
but given that the approved hosting providers for healthcare data are 2-3x more expensive than your standard public cloud, we only want to host our production workloads at the "fancy" hosting provider, our dev, qa, etc would be on a public cloud

2017-03-21 21:56
therefore, we want to be able to deploy/manage K8S on public cloud and a "sea of VMs" using the same tools, especially for upgrades and rollouts

2017-03-21 21:57
this is where digital.rebar comes into the picture

2017-03-21 21:57
I really see it as the "spinnaker of infrastructure"

2017-03-21 21:57
hopefully, that's how you guys see it too:)

greg
2017-03-21 21:57
It should be able to do must of that or could be added.

greg
2017-03-21 21:58
big huge covering sail that drives everything.

greg
2017-03-21 21:58
nice - not one we've used before.

2017-03-21 21:59
now, my first question is how do I join existing nodes (say the VMs that the hosting provider has spun up for me) to DR?

2017-03-21 21:59
the docs are detailed on the baremetal case and the public cloud case

2017-03-21 22:01
I obviously already have an admin DR running

2017-03-21 22:01
on the same subnet

greg
2017-03-21 22:01
Initially, you will need a script for that. It mostly exists. Let me find it.

greg
2017-03-21 22:01
The second way is to create a provider for your cloud host. I assume they have some kinda of API.

2017-03-21 22:02
I have no api available, it's one of those "managed vmware" environmnets

2017-03-21 22:02
so that script is probably the way to go

greg
2017-03-21 22:02
okay - we lose some managability, but you may not need it.

2017-03-21 22:02
can you be more explicit?

greg
2017-03-21 22:03
well - we won't necessarily be able to reboot or config bios and other stuff, but that is probably okay.

greg
2017-03-21 22:03
You want post os-install operation most likely.

greg
2017-03-21 22:03
so , the "join" script should be fine.

2017-03-21 22:03
in deed

greg
2017-03-21 22:04
you seems script aware so here we go.

greg
2017-03-21 22:04
There are two scripts.

greg
2017-03-21 22:06
in digitalrebar/deploy - add-from-ssh.sh - script assumes you have some ssh path to the node in question. It will attempt to ssh into the provided IP and then run a couple of script to join the node to rebar so that rebar can ssh into it.

2017-03-21 22:06
let try that

greg
2017-03-21 22:06
it runs scripts/ssh-copy-id.sh to attempt to setup keys if your current running user can't access the node.

greg
2017-03-21 22:07
Then it runs scripts/join_rebar.sh from scripts/rebar_join.sh

greg
2017-03-21 22:07
BUT FIRST You need to make an edit that I haven't done yet. We don't use these often.

2017-03-21 22:07
what's the edit and can I submit it as a PR after I verify that it works? :-)

greg
2017-03-21 22:08
in all three scripts, find :3000 and remove it.

greg
2017-03-21 22:09
yep - that is annoying, but missed those on the unified frontend change.

2017-03-21 22:09
ubuntu@ip-172-30-0-51:~/digitalrebar/deploy/scripts$ grep -r ":3000" * wait_for_rebar.sh:export REBAR_ENDPOINT=${REBAR_ENDPOINT:-https://127.0.0.1:3000}

2017-03-21 22:09
do I need to change wait_for_rebar.sh?

greg
2017-03-21 22:10
checking. It won't hurt. :3000 isn't exposed directly, anymore

greg
2017-03-21 22:10
./add-from-ssh.sh --admin-ip <ADMINIP> <cidr IP of target node>

greg
2017-03-21 22:10
e.g. ./add-from-ssh.sh --admin-ip 192.168.124.11 192.168.124.100/24

2017-03-21 22:11
:3000 no mode

greg
2017-03-21 22:11
cool

greg
2017-03-21 22:12
If that completes, you should be able to see the node in the UX.

2017-03-21 22:20
what is the argument to that script : init-ident

greg
2017-03-21 22:22
the user to login the first time.

greg
2017-03-21 22:22
that your system as access to, I think.

greg
2017-03-21 22:23
if your initial allowed user is fred (and it has sudo access), then use fred.

greg
2017-03-21 22:23
for example

2017-03-21 22:23
understood - I am finding a couple of typos in the scripts, working through them :)

2017-03-21 22:25
this is the first ssh call in the add-from-ssh script

2017-03-21 22:25
ssh -oBatchMode=yes -o StrictHostKeyChecking=no root@$IP date

greg
2017-03-21 22:25
we haven't driven them in awhile. Yeah - it is trying to see if your current user has ssh access as root to the IP in question.

2017-03-21 22:25
that clearly assumes that root is a user that is (1) usable for ssh

2017-03-21 22:26
let me adjust things to use the script params into account

greg
2017-03-21 22:26
hmm - yeah.

greg
2017-03-21 22:30
Yes - the first one is testing if the current running system/user combo can log in as root.

greg
2017-03-21 22:30
If not, then you need to run ssh-copy-id.sh to add root access. The ssh-copy-id.sh script takes the init-ident parameter to let you use a non-root user to setup root user access.

greg
2017-03-21 22:30
The rest of the script assume root.

greg
2017-03-21 22:31
In the end, you need the following.

2017-03-21 22:31
I am pretty familiar with bash

2017-03-21 22:31
I will get through this and I will try to clean things up and submit it back

greg
2017-03-21 22:31
okay - then I'll quit splaining that.

greg
2017-03-21 22:31
In the end, the goal is that the target node has the following:

greg
2017-03-21 22:32
the ssh access keys from the rebar attribute that is acquired by curl in the root authorized keys file.

greg
2017-03-21 22:32
and that a node has been created in DR, with the rebar-joined node role attached, and the node marked alive.

greg
2017-03-21 22:33
That last one is done by the rebar-join script (calls to rebar).

greg
2017-03-21 22:34
we used this script to add packet nodes that were already existing so it will need to be tailored to your environment.

2017-03-21 22:41
The ssh part is work-ish with some hacking, now to the node registration part

2017-03-21 22:41
I am hitting this : curl: (22) The requested URL returned error: 502 Bad Gateway

2017-03-21 22:42
I am trying to figure out why the admin is responding like that when it's clearly running fine

greg
2017-03-21 22:42
yeah - that is an issue. Give me second.

greg
2017-03-21 22:44
do you have rebar cli in your path?

greg
2017-03-21 22:44
rebar nodes list

2017-03-21 22:45
nope

2017-03-21 22:45
another script typo, it seems :

greg
2017-03-21 22:45
no

greg
2017-03-21 22:45
oh

2017-03-21 22:45
ssh root@$IP /root/join_rebar.sh $ADMIN_IP

2017-03-21 22:46
when join_rebar.sh expects a second arg

greg
2017-03-21 22:46
ok

2017-03-21 22:46
ADMIN_IP=$1 PASSED_IN_IP=$2

2017-03-21 22:46
Let me try to fix this first

2017-03-21 22:47
ok, that didn't help

2017-03-21 22:47
so what about the rebar cli

2017-03-21 22:47
I must have missed the pre step to in the docs

greg
2017-03-21 22:47
I was looking at wrong curl. You are farther than I thought

greg
2017-03-21 22:48
add nodes curl call

2017-03-21 22:48
I am trying to add some echo logging to figure out which curl call in join_rebar is blowing up

greg
2017-03-21 22:49
Okay the PASSED_IN_IP isn't needed unless you want to explicitly declare the IP to use. If these aren't multi-homed boxes, it shouldn't matter.

greg
2017-03-21 22:49
The other thing is to log into your system and run the script from there with -x set.

greg
2017-03-21 22:50
This is dire need of updating.

2017-03-21 22:51
:)

2017-03-21 22:51
I logged into the system I am trying to join to rebar

2017-03-21 22:51
and run join_rebar by hand

greg
2017-03-21 22:54
yeah - I'll probably have to update this.

2017-03-21 22:54
This is the curl that's returning a 502 bad gateway

2017-03-21 22:54
exists=$(curl -k -s -o /dev/null -w "%{http_code}" --digest -u "$REBAR_KEY" \ -X GET "$REBAR_WEB/api/v2/nodes/$HOSTNAME")

greg
2017-03-21 22:55
hmm what are rebar_key and rebar_web

2017-03-21 22:56
export REBAR_KEY="$REBAR_USER:$REBAR_PASSWORD" export REBAR_WEB="https://$ADMIN_IP"

greg
2017-03-21 22:56
Have you logged in the UX?

2017-03-21 22:57
yes

greg
2017-03-21 22:57
good.

greg
2017-03-21 22:57
just making sure

greg
2017-03-21 22:57
from the node do this:

greg
2017-03-21 22:58
curl http://<admin_ip>:8092/files/rebar -o rebar

greg
2017-03-21 22:58
chmod +x rebar

greg
2017-03-21 22:58
export REBAR_ENDPOINT=https://<ADMIN_IP>

greg
2017-03-21 22:58
export REBAR_KEY=rebar:rebar1

greg
2017-03-21 22:58
./rebar nodes list

2017-03-21 22:59
ubuntu@k8s-node1:~$ curl -v http://172.30.0.51:8092/files/rebar _ Trying 172.30.0.51... _ Connected to 172.30.0.51 (172.30.0.51) port 8092 (#0) > GET /files/rebar HTTP/1.1 > Host: 172.30.0.51:8092 > User-Agent: curl/7.47.0 > Accept: _/_ > * Connection #0 to host 172.30.0.51 left intact

2017-03-21 22:59
I am getting nothing back

greg
2017-03-21 23:00
is the admin node multi-homed?

greg
2017-03-21 23:00
hmm - that shouldn't matter.

greg
2017-03-21 23:00
at least not for this.

2017-03-21 23:00
it's not

2017-03-21 23:00
and that's the IP that I access the UI at

greg
2017-03-21 23:01
on the admin node as the user you run the install script from, ls ~/.cache/digitalrebar/tftpboot/files

2017-03-21 23:01
looking at the various docker containers of the admin, I don't see any port 8092 that is open ...

greg
2017-03-21 23:01
oh - sigh.

greg
2017-03-21 23:01
do you have a provisioner?

2017-03-21 23:01
not yet

2017-03-21 23:02
how does that work for nodes that are already existing?

greg
2017-03-21 23:02
thinking about it.

greg
2017-03-21 23:02
okay - should be okay, but you don't have a quick and handy rebar cli to get.

greg
2017-03-21 23:02
Sooo - from the admin node do this:

greg
2017-03-21 23:03
docker cp compose_rebar_api_1:/usr/local/bin/rebar rebar

greg
2017-03-21 23:03
that will get you one to play with

2017-03-21 23:03
got it

2017-03-21 23:03
now I want to push this to the node I want to join, right?

greg
2017-03-21 23:03
sure. Let's go for it.

2017-03-21 23:04
got it

greg
2017-03-21 23:04
that is handy to keep around on the admin node because you can tweak things and get at some things programmatically easier with it.

greg
2017-03-21 23:05
```chmod +x rebar export REBAR_ENDPOINT=https://<ADMIN_IP> export REBAR_KEY=rebar:rebar1 ./rebar nodes list```

2017-03-21 23:05
ubuntu@k8s-node1:~$ ./rebar nodes list [ { "admin": false, "alive": true, "allocated": false, "arch": "x86_64", "available": true, "bootenv": "local", "created_at": "2017-03-21T21:43:48.650Z", "deployment_id": 1, "description": "", "icon": "check_circle", "id": 1, "name": "system-phantom.internal.local", "node-control-address": null, "os_family": "linux", "profiles": [], "provider_id": 1, "quirks": [], "state": 0, "system": true, "target_role_id": null, "tenant_id": 1, "updated_at": "2017-03-21T21:43:48.650Z", "uuid": "557d47bc-4abd-44a9-83e8-58d6176ffe1a", "variant": "phantom" }

2017-03-21 23:05
already did

greg
2017-03-21 23:05
yeah - something

greg
2017-03-21 23:05
so the node can drive it.

greg
2017-03-21 23:05
The curl commands are wonky.

2017-03-21 23:06
is the plan to replace the curls with the rebar cli ?

greg
2017-03-21 23:06
The curl worked for me on my system here.

greg
2017-03-21 23:06
So - I was wanting to make sure that something worked.

greg
2017-03-21 23:07
If you could add set -x to the top of the script and dump the curl command that is executing for me to try would be helpful.

2017-03-21 23:07
right away

2017-03-21 23:08
+ curl -k -f -g --digest -u rebar:rebar1 -X POST -d name=k8s-node1.xxxx -d ip=172.30.yyy.zzz/24 -d provider=metal -d variant=metal -d os_family=linux -d arch=x86_64 https://172.30.0.51/api/v2/nodes/

greg
2017-03-21 23:09
okay - here you go- change the curl to this:

2017-03-21 23:10
I think the curl I mentioned further above was the wrong one

2017-03-21 23:10
This is the one in the script that is blowing up

2017-03-21 23:10
curl -k -f -g --digest -u "$REBAR_KEY" -X POST \ -d "name=$HOSTNAME" \ -d "ip=$IP" \ -d "provider=metal" \ -d "variant=metal" \ -d "os_family=linux" \ -d "arch=$(uname -m)" \ "$REBAR_WEB/api/v2/nodes/" || {

greg
2017-03-21 23:12
``` curl -k -f -g --digest -u "$REBAR_KEY" -X POST \ -d "{ \"name\": \"$HOSTNAME\", \ \"ip\": \"$IP\", \ \"provider\": \"metal\", \ \"variant\": \"metal\", \ \"os_family\": \"linux\", \ \"arch\": \"$(uname -m)\" }" \ "$REBAR_WEB/api/v2/nodes/" ```

2017-03-21 23:13
There is a missing } somewhere

greg
2017-03-21 23:13
```curl -k -f -g --digest -u "$REBAR_KEY" -X POST \ -d "{ \"name\": \"$HOSTNAME\", \ \"ip\": \"$IP\", \ \"provider\": \"metal\", \ \"variant\": \"metal\", \ \"os_family\": \"linux\", \ \"arch\": \"$(uname -m)\" }" \ "$REBAR_WEB/api/v2/nodes/"```

2017-03-21 23:14
trying

2017-03-21 23:17
youhou

2017-03-21 23:17
I got my first node in :)

greg
2017-03-21 23:17
okay

greg
2017-03-21 23:17
did the other curls value or work?

2017-03-21 23:18
I just run that curl on the cli and the node appeared

2017-03-21 23:18
let me fix the script and try to run it completelz

greg
2017-03-21 23:19
it won't, but it is better. :slightly_smiling_face:

2017-03-21 23:24
I have another one that's blowing up :

2017-03-21 23:24
curl -k -f -g --digest -u rebar:rebar1 -X POST -d '{"node":"k8s-node1.xxx.zzz" , "role":"rebar-joined-node" }' https://172.30.0.51/api/v2/node_roles/

2017-03-21 23:28
ok, it's all good

2017-03-21 23:28
a couple of missing things

2017-03-21 23:28
really need to have : -H "Content-Type: application/json"

greg
2017-03-21 23:28
yes

greg
2017-03-21 23:28
Did it make it through?

2017-03-21 23:28
that's what was leading to the 502

2017-03-21 23:28
yeap, full script made it through

2017-03-21 23:29
need to try running it now from the add-from-ssh.sh

greg
2017-03-21 23:29
so your node should have start starting to run on it.

greg
2017-03-21 23:29
stuff that is.

greg
2017-03-21 23:29
I think.

greg
2017-03-21 23:29
You should see it in the system deployment with some node roles

2017-03-21 23:29
I see

2017-03-21 23:30
side note : if rebar expect to use root to remote ssh into nodes under management, that's a real problem

greg
2017-03-21 23:31
it currently does.

greg
2017-03-21 23:31
We can talk about that at some point.

2017-03-21 23:31
hmm

2017-03-21 23:31
ok

greg
2017-03-21 23:33
I think it will be a "small" change for your environment, but has consequences that I think I need to work through. We should change away from root access. The problem is that it requires sudo because some of the actions need root abilities. That isn't always done consistently.

greg
2017-03-21 23:33
for example, do you have unrestricted sudo access?

greg
2017-03-21 23:33
or is it command level enablement?

2017-03-21 23:34
for the user I would use for rebar, it would be unrestricted

greg
2017-03-21 23:34
okay - so, we have this thing called an SSH hammer. It is the underlying component that provides SSH access to the rest of the system.

greg
2017-03-21 23:34
It assumes that the root user has the keys in place.

2017-03-21 23:35
I understand that that is a sensible assumption for bare metal environments

2017-03-21 23:35
I am just a little surprised that the same practice carried over to the cloud provisioners for example

greg
2017-03-21 23:35
it is and it isn't. ubuntu makes us do extra gyrations

2017-03-21 23:35
(at least, that's my conclusion)

greg
2017-03-21 23:35
common shared code

greg
2017-03-21 23:36
we try to try everything the same after certain points from a normalization perspective.

2017-03-21 23:36
classic cross-OS scripting ...

greg
2017-03-21 23:36
That way same code runs in both places. Part of the value. Normalize hardware so the cloud instance looks like the hardware instance and vice versa.

greg
2017-03-21 23:37
Yeah - but in this case it is more at platform boundaries as well.

2017-03-21 23:37
is there a way on gitter to send a screen capture?

greg
2017-03-21 23:37
not sure.

greg
2017-03-21 23:37
email:greg@rackn.com if that is easier

greg
2017-03-21 23:38
For a quick check, can you do this for me from the admin node:

greg
2017-03-21 23:38
rebar hammers list

2017-03-21 23:39
ubuntu@ip-172-30-0-51:~/digitalrebar/deploy$ ./rebar hammers list [ { "actions": { "power": [ "reboot" ], "run": [ "run" ], "xfer": [ "copy_from", "copy_to" ] }, "available_hammer_id": 4, "endpoint": null, "id": 1, "name": "ssh", "node_id": 2, "priority": 0, "username": "root", "uuid": "c117a7d0-2b89-44bd-8333-82a5b39801ff" } ]

2017-03-21 23:40
I think the "node prep" isn't starting because the admin doesn't know how to ssh to node

greg
2017-03-21 23:40
that is the secure shell hammer (you don't have an IPMI because you aren't hardware)

greg
2017-03-21 23:40
okay - probably.

greg
2017-03-21 23:40
I suspect that we are missing a parameter on node create.

greg
2017-03-21 23:41
By the way, to start over. just delete the NODE in the UI and rerun the join script.

greg
2017-03-21 23:41
let me check real quick

greg
2017-03-21 23:41
The username in hammer could be changed to something else, but a core change is also needed.

greg
2017-03-21 23:41
no container rebuilds though

2017-03-21 23:42
what does this mean : "but a core change is also needed."

greg
2017-03-21 23:42
the rails app needs two changes. We didn't use username correctly in all the places and you need a way to set something other than root.

2017-03-21 23:43
yeap - understood

2017-03-21 23:44
ah

2017-03-21 23:44
I took a step back and I rerun add-from-ssh after dropping in the updated join_rebar.sh script and things seem to be moving somehow

greg
2017-03-21 23:44
okay

2017-03-21 23:45
all components are green in the UI

2017-03-21 23:45
hmm

2017-03-21 23:45
It's getting late here (1am almost)

2017-03-21 23:46
I am gonna fork the repo, push my fixes to my fork, try to cleanup somethings that I hardcoded to ubuntu

2017-03-21 23:46
and try with a second node out of the box tomorrow as I find time.

greg
2017-03-21 23:46
okay - I'm missing family dinner time here. :slightly_smiling_face: Thanks for playing along and working through.

greg
2017-03-21 23:46
That would be great. I'll be around tomorrow.

2017-03-21 23:46
sounds good

2017-03-21 23:46
thanks for all the help

2017-03-21 23:47
I understand we're a special case but I do believe we're not alone though

2017-03-21 23:47
to be continued

2017-03-21 23:47
cheers

greg
2017-03-21 23:47
no prob. THis is good for us too. You are and not. Later.

2017-03-23 16:30
hello, I'm new at DigitalRebar and will try something out before i can make a advice for a customer project in my company. One point I have find is the need of have full access to the internet for some installation reason. Can you provide a list of packages which will used/ needed by DigitalRebar to be sure all packages are available at a customer repository? In a prodictive customer environment we have no chance for direct internet access. all packages will be stored on a private SatelliteServer p.e.. Thanks!

greg
2017-03-23 16:43
@theta-my - is this for the initial install of digital rebar or post-install of Digital rebar and you are trying to provision the managed nodes?

2017-03-23 16:47
Hi, this at first for the first installation steps via quickstart.sh.

greg
2017-03-23 16:47
okay - so installing digitalrebar.

2017-03-23 16:47
yes

greg
2017-03-23 16:47
It is not packages based. It is container based. So, you will need docker packages, ansible, and python packages.

2017-03-23 16:48
An other approach can be I would find the missing packages in a log...

greg
2017-03-23 16:48
Then you will need a way for docker to get images from docker hub and then you will need to stage some boot discovery images in a cache directory.

greg
2017-03-23 16:49
There are a lot of steps that we haven't documented outside of code I'm afraid.

2017-03-23 16:49
I know, and that's is my problem ;)

greg
2017-03-23 16:49
Our assumption is that the admin node will have at least proxy-based internet access. We are trying to change some of that in the coming months, but we aren't there yet.

greg
2017-03-23 16:50
Once installed, the admin node can be configured to point at internal repos and things like that to address the internet gap.

2017-03-23 16:51
I can understand your problem, but in a proper production environment no one will give your a direct link to the internet to download some stuff without a security check.

2017-03-23 16:52
And the best way to prevent this needs, is to have a own repository. That's why i need the list.

greg
2017-03-23 16:52
well - we've found that many differentiate tool install from tool use, but I understand your point.

2017-03-23 16:53
A proper log, where i can find in a first (dry) run all needed packages can be a good solution to reduce the effort on your site...

greg
2017-03-23 16:54
Tool use can use other repos. Tool install currently can't. You can get close by from an internet accessible location - docker pull all of our images, docker export them, and the docker import them on the admin node. That will get you most of the way to running digitalrebar without internet access, but there are all sorts of little gotchas. It just hasn't come up has a high priority item. It is something we are tracking because we hear about it as a concern but not a blocker (until now).

greg
2017-03-23 16:55
What is the OS you are using for the admin node?

2017-03-23 16:56
This must run at RHEL7

2017-03-23 16:57
I will try to go further with minimum tools/ services you have described and will come back if i need more... ;)

2017-03-23 16:57
Thanks for your intention!

greg
2017-03-23 16:58
The base system will need: sudo, git, ansible, python-netaddr, curl, jq (epel-release required)

greg
2017-03-23 16:58
This will get you ansible capable.

greg
2017-03-23 16:59
You will need this: curl -so rebar https://s3-us-west-2.amazonaws.com/rebar-bins/linux/amd64/rebar - in /usr/local/bin and chmod +x

greg
2017-03-23 16:59
on the admin node target as a user:

greg
2017-03-23 16:59
you will need the digitalrebar repo pulled from git as the directory digitalrebar.

2017-03-23 17:01
The last step is clear. I have make a git get to my workstation and copy the complete download to the target server. Than i have run the quickstart script and this ends with one error

2017-03-23 17:01
no proper epel repository access...

greg
2017-03-23 17:02
At that point, you should be able to run quickstart.sh (assuming other packages are installed).

greg
2017-03-23 17:02
So, what error do you get?

greg
2017-03-23 17:03
Did you add the provisioner flag to quickstart, you probably should.

2017-03-23 17:03
TASK [Install EPEL [SLOW]] ***************************************************** failed: [10.241.236.92] (item=[u'epel-release']) => {"changed": true, "failed": true, "item": ["epel-release"], "msg": "warning: /var/cache/yum/x86_64/7Server/BA-20170319-epel_rhel7_x86_64/packages/epel-release-7-7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY\n\n\nPublic key for epel-release-7-7.noarch.rpm is not installed\n Warning: Due to potential bad behaviour with rhnplugin and certificates, used slower repoquery calls instead of Yum API.", "rc": 1, "results": ["Loaded plugins: langpacks, product-id, rhnplugin, search-disabled-repos,\n : subscription-manager\nThis system is receiving updates from RHN Classic or Red Hat Satellite.\nResolving Dependencies\n--> Running transaction check\n---> Package epel-release.noarch 0:7-7 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package Arch Version Repository Size\n================================================================================\nInstalling:\n epel-release noarch 7-7 BA-20170319-epel_rhel7_x86_64 14 k\n\nTransaction Summary\n================================================================================\nInstall 1 Package\n\nTotal download size: 14 k\nInstalled size: 24 k\nDownloading packages:\nPublic key for epel-release-7-7.noarch.rpm is not installed\n"]}

2017-03-23 17:04
@theta-my were you able to get an end-to-end system running before trying to get DR running w/o internet connectivity?

2017-03-23 17:04
not for this case, but before you tried to run disconnected...

2017-03-23 17:05
only on a laptop by my site, yes, this should be go

2017-03-23 17:06
you mean, I install all this stuff on a local system and provide the docker container to the target location?

2017-03-23 17:07
this should be run, than i only need the base services and tools to run docker.

2017-03-23 17:07
good idea

2017-03-23 17:08
Some time I can not see a tree in a forest... :)

2017-03-23 17:10
I will try this approach out. Thanks!

2017-03-25 19:06
Hi all. Just got started with a 3 node Dell r610 cluster. Just looking to see if I can also have the Admin Node run a workload as well.

2017-03-25 19:15
Also looking at clustering Admin Nodes between hosts.

2017-03-25 19:30
@Orionx86 - you already have rebar running or are you trying to get started?

2017-03-25 19:31
the default install will include several workloads, Kubernetes is the one I'd recommend first

2017-03-25 19:32
clustering of Rebar admin would likely take some 1x1 discussion

2017-03-25 19:32
building a k8s cluster should work fine - will require a external load balancer


2017-03-25 19:33
I have rebar running already on a dell Centos Admin node

2017-03-25 19:33
getting no available nodes for kubernetes wizard though

2017-03-25 19:34
nice, for the wizard, you need to have discovered nodes in a deployment (defaults to system)

2017-03-25 19:34
or create a provider and install to cloud

2017-03-25 19:34
the wizard will let you choose which deployment to use as the source for "use existing nodes"

2017-03-25 19:34
Can I add to system? won't let me throw on to it.

2017-03-25 19:35
discovered nodes go into system when they are booted

2017-03-25 19:35
are you trying to add existing nodes?

2017-03-25 19:35
I will but I wanted to see if I can also use the admin node since most of it is currently unused

2017-03-25 19:36
ah, not a recommended config because systems could be asked to reboot

2017-03-25 19:36
you CAN install the kvm-slaves and boot those locally

2017-03-25 19:36
we do that all the time to run the provisioning cycle

2017-03-25 19:37
gotcha. worth a try. Just sad because I have a 16 thread/32 gb system and im using threads and 4 gb's

2017-03-25 19:37
4 thread/4gbs

2017-03-25 19:37
yeah, that's what we use core/tools/kvm-slave for ... getting a doc link

2017-03-25 19:37
Just about to spin up node 2. This is a great system. I've watched a bunch of your videos so far

2017-03-25 19:38
http://digital-rebar.readthedocs.io/en/latest/development/dev_env/kvm-slaves.html

2017-03-25 19:38
thanks!

2017-03-25 19:38
Will you guys be at docker con this year?

2017-03-25 19:38
yes, I'll be there (attending, no booth)

2017-03-25 19:38
we're based in Austin, so the team's nearby. (also, my daughter is presenting)

2017-03-25 19:39
I'll be in town all week. That's great! whats she presenting?

2017-03-25 19:42
talking about workplace diversity on Tuesday 11:05 - Moby's Theater, EH-4 & EH-5

2017-03-25 19:42
cool I'll make it a point to attend.

2017-03-25 19:44
I'll be online for a while if you have more questions

2017-03-25 19:45
ya definitly. I saw there is integration between the 4 big automation tools but havent seen too much on saltstack. Are their any docs I cant find surrounding it? looking to build a workflow around Vmware

2017-03-25 19:46
Salt Proxy Minion is the obvious choice for me

2017-03-25 19:52
it's been a long time since we worked on the salt integration. it's still there but likely needs work

2017-03-25 19:52
ansible & bash are our primary targets for latest dev

2017-03-25 19:53
provisioning VMware or using a VMware provider?

2017-03-25 19:53
Yes, a provider.

2017-03-25 19:54
for providers, check out the pattern w/ the openstack provider. it's the most recent and splits all the code into a dedicated file.

2017-03-25 19:55
I'm happy to put together a training video on how to build a provider

2017-03-25 19:56
dusting off the salt jig would likely involve talking w/ @galthaus because there are some choices that we could revisit based on the use-cases. Right now, the plan was mainly to plug it into an existing salt server and turn over control. We've got some more integration capabilities that could be used.

2017-03-25 19:59
I understand. I've been toying with openstack ironic and saltstack in my org so this is a welcomed addition if I can get this up and running in my homelab atleast.

2017-03-25 20:05
Have you seen any issues with people using DDWRT configs with PXE booting?

greg
2017-03-26 14:09
Not familiar with DDWRT off the bat.

2017-03-26 17:42
http://www.dd-wrt.com/help/english/HManagement.asp ? We used to see issues w/ port fast settings on switches not being set correctly.

greg
2017-03-26 22:28
Okay. We routinely tell people to set portfast on server ports. To keep ipxe firm wares from timing out

2017-04-01 23:12
Hello, how can I run the vagrant/compose setup with DHCP?

2017-04-01 23:27
vagrant as the admin or nodes? I could never figure out how to make the vagrant respect DHCP boot

2017-04-01 23:28
@chilicat what are you trying to do w/ Vagrant? We don't use it that much

2017-04-01 23:28
not sure what you mean with admin or nodes... DR runs on the base node as docker container

2017-04-01 23:28
are you running the base via Vagrant?

2017-04-01 23:28
yes

2017-04-01 23:29
(btw: your admin password issue may just be complexity requirements - the password must be alpha and numeric)

2017-04-01 23:29
yes the password must be "non-simple" just wanted to report that the ui doesn't seem to tell you :)

2017-04-01 23:30
ah, will add some guideance

2017-04-01 23:30
I could never find a way for a Vagrant node to DHCP boot - it seems like there's a very strong assumption that you start with a working image.

2017-04-01 23:31
so, you'd have to mix and match VirtualBox VMs & Vagrant to simulate boot. In that case, I'd recommend using 100% virtual box.

2017-04-01 23:31
in vagrant there is a docker setup - maybe my question is not vagrant specific. I want to add the dhcp service to my setup (its off by default)

2017-04-01 23:31
do I use ./init_files.sh ?

2017-04-01 23:31
if you use run-in-server.sh then add --con-dhcp --con-provisioner

2017-04-01 23:33
the recommended approach is to rerun run-in-server with the added commands and it will update the configs.

2017-04-01 23:33
I will try that.

2017-04-01 23:33
does it also use the docker compose files? or is that a different setup?

2017-04-01 23:33
digitalrebar/deploy/compose

2017-04-01 23:35
run-in-system runs the Ansible digitalrebar.yml file

2017-04-01 23:35
that file then runs the command: "./setup.sh --tag {{ dr_tag }} {{ dr_workloads | default([]) | join(" ") }}" to correctly configure docker-compose config

2017-04-01 23:36
then rebar runs with docker-compose -up

2017-04-01 23:36
obviously, there's a lot of other setup steps, but that's the key elements of the DR config

2017-04-01 23:37
I see, yea it all runs to the setup.sh. I try my luc

2017-04-01 23:37
thanks

2017-04-01 23:37
So, YES. Everything ultimately runs to docker-compose.

2017-04-01 23:38
even the core/tools/docker-admin.sh scripts are doing the same thing. They just assume that you've already got the pre-reqs.

2017-04-01 23:39
we

2017-04-01 23:40
if all you need is DHCP/Provision.

2017-04-01 23:40
preview github.com/rackn/rocket-skates

2017-04-01 23:41
I also need the deployment mechanisms put a slim DHCP/PXE/TFTP in go sounds good

2017-04-01 23:42
is there any content management for DR?

2017-04-01 23:46
We use deployments to manage parallel application deployments and then track the state of them using the roles system. Not exactly what I'd consider content management. Do you have a specific use case or platform in mind?

2017-04-01 23:48
In a on-premise deployment we must deliver all our software packages (rpm, zip, docker containers, isos). I was just wondering if there was something where you manage the content and associate content with a deployment.

2017-04-01 23:48
I little like Katello

2017-04-01 23:49
Ok was able to add the dhcp service... vagrant uses quickstart.sh which also accepts the parameter you gave me.

2017-04-01 23:50
glad the vagrant is working! it's been a while since I was testing it

2017-04-01 23:50
I had some issue with the swapfile... I was switching back to a older ubuntu - but else it works

2017-04-01 23:51
got it, there are a few ways to handle that, yes. we've done "air gap" deploys where there's no external connectivity. Your request is a variant. takes some twiddling. @galthaus would be a better person to answer.

2017-04-01 23:51
I saw the issue, thanks for opening it - that helps me know what to check'

2017-04-01 23:53
since you can add your own roles, you could just inject a few bash or ansible commands that would pre/set the environment. also, the proxy & mirrors are configurable so you can point those to internal resources

2017-04-01 23:53
One more question, is it possible to use DR when you have absolutly no control over DNS and DHCP. I know the functionality is very limited in this case but it would be still possible to deploy software on existing machines,

2017-04-01 23:55
100% yes. that's pretty normal scenario. We would just need forward or next boot configured for DHCP (e.g.: we work w/ infoblox). For DNS, that's configurable too. Can an own, delegate, drive or ignore DNS. Pretty flexible about that.

2017-04-01 23:57
this is old, but mostly right > http://digital-rebar.readthedocs.io/en/latest/deployment/old/external-services.html?highlight=dns

2017-04-01 23:58
I found an out of date page about DHCP that I'll fix. please disregard > http://digital-rebar.readthedocs.io/en/latest/faq/dedicated_dhcp.html?highlight=dhcp

2017-04-01 23:58
thanks.

2017-04-01 23:58
call my vms pxe request works... just TFTP times out :)

2017-04-01 23:59
ah... that's sadly common. TFTP and Docker port mapping are not friends.

2017-04-01 23:59
that's a primary reason why we're doing Rocket Skates as a stand alone binary

2017-04-02 00:00
the last line is: "TFTP..."

2017-04-02 00:00
maybe it is doing something?

2017-04-02 00:00
it's going to be inconsistent because the TFTP ports are being assigned into ranges that Docker is unhappy about

2017-04-02 00:00
if you scroll back in this chat history, you'll see dicussions about this.

2017-04-02 00:02
which container has the TFTP server?

2017-04-02 00:03
provisioner?

2017-04-02 00:03
yes

2017-04-02 00:05
sorry, gtr... the TFTP discussion from earlier was on 2/16 w/ mech422 and @galthaus

2017-04-02 00:07
but dns is also in a the low range... that is not a problem for docker?

greg
2017-04-02 01:08
It is really a protocol problem.

greg
2017-04-02 01:09
Make sure you have tftp nat modules loaded into your kernel.

2017-04-02 01:20
I guess that would be "modprobe nf_nat_tftp"

2017-04-02 01:20
have to try that tomorrow.

2017-04-02 01:20
thanks for your heklp

greg
2017-04-02 01:20
yeah

2017-04-02 03:46
Figured out what the problem was. The vagrant box has two interfaces where the eth1 is in the shared network. But "Next Server" option is set to the ip address of the eth0.

greg
2017-04-02 13:23
Awesome

2017-04-03 17:42
What would y'all use for a bunch of metal other than OpenStack?

2017-04-04 00:42
Kubernetes

2017-04-04 00:43
Or just Ansible if you had a app already

2017-04-04 15:02
I mean to provide VMs. I need a whole bunch of discrete network stacks to prove out architectures. Can kvm-slave.sh work across metal?

greg
2017-04-04 15:06
currently - no. Do the VM sets need to have isolated networks?

2017-04-04 15:27
Nope.

2017-04-04 15:27
A "nice to have" option.

greg
2017-04-04 15:28
hmm - okay - I don't have a good option for any answer currently.

greg
2017-04-04 15:30
One is to do the k8s/flat hack. Take a set of machines, run docker on them with sub ranges and a super net. Than use tools/kvm-slave to start vms on that docker bridge, but that is ugly and fraught with peril.

zehicle
2017-04-04 16:12
there was one request that would treat each node as a dedicated cloud (and this isolated). You could use Rebar to build a node islands that would then be isolated.

zehicle
2017-04-04 16:13
cross node networking is always intersting - that's why SDN layers get involved

2017-04-06 20:05
"interesting" as in, omg, encapsulation. :)

2017-04-07 02:38
Is this no longer kosher? How to get more detailed debug info? rebar --debug nodes create '{"name"; "os1.newgoliath.com", "bootenv": "local"}' 2017/04/06 22:36:01 Talking to Rebar with https://127.0.0.1 (rebar:reebaar) 2017/04/06 22:36:01 Unable to create new node: Expected status in the 200 range, got 400 Bad Request

greg
2017-04-07 02:42
Rebar nodes update name json

greg
2017-04-07 02:42
Assuming os1 exists

2017-04-07 02:43
fresh install. no nodes

2017-04-07 02:44
rebar whoami 2017/04/06 22:37:44 Cannot determine what node this is: Expected status in the 200 range, got 404 Not Found

2017-04-07 02:45
just he phantom

2017-04-07 02:46
hostname -f os1.newgoliath.com

2017-04-07 02:48
duh, I see it. semicolon

2017-04-07 02:51
Thank you, Greg!

greg
2017-04-07 02:52
:slightly_smiling_face: no problem, Judd!

greg
2017-04-07 02:52
Glad to see you back hanging around

2017-04-07 02:53
I've gotta do SOMETHING with all this gear I have laying around.

greg
2017-04-07 02:53
:slightly_smiling_face:

2017-04-07 14:43
By default there's a router defined at 192.168.124.11, but I don't see any interfaces with that address. Can I ignore?

greg
2017-04-07 14:59
That is the Forwarder container's IP on the docker bridge. It allows you to use the tools/kvm-slave script.

greg
2017-04-07 15:00
In forwarder mode, you can bridge a second interface into the docker bridge to use the 124 network with external hosts.

greg
2017-04-07 15:01
You could reinstall with HOST mode (--access HOST) That will use the machine host networking for accessing out to machines. You would then create an admin network for what you already have with the external routers.

2017-04-07 15:02
what I already have?

greg
2017-04-07 15:07
I suspect that you took the defaults which is FORWARDER mode.

greg
2017-04-07 15:08
We are right now contemplating getting rid of this confusion, but that isn't ready yet.

2017-04-07 15:09
That's correct. I'm trying to plan a new install. I have 4 R410s. They have 2 nics, publically routable. I'm considering running one as the Rebar admin + whatever workloads. Is that doable?

2017-04-07 15:09
Or is there a happier path?

2017-04-07 15:10
Well, 1 nic on each is publically routed.

greg
2017-04-07 15:10
Yes - You probably want HOST mode with the internal nic as your internal pxe network.

2017-04-07 15:11
Seems like it.

greg
2017-04-07 15:11
You want to set the external IP to your node's intenral IP.

greg
2017-04-07 15:11
Just a second - Let me find an example.

greg
2017-04-07 15:12
```curl -fsSL https://raw.githubusercontent.com/digitalrebar/digitalrebar/master/deploy/quickstart.sh | bash -s -- --con-provisioner --con-dhcp --admin-ip=1.1.2.3/24 --access=HOST```

greg
2017-04-07 15:12
Make 1.1.2.3/24 the CIDR IP form of your internal network.

greg
2017-04-07 15:13
The API/UI will listen on all by default interfaces so you can have external inet access.

greg
2017-04-07 15:13
That is the quickstart form.

2017-04-07 15:13
I tried to get all fancy, and confused myself.

2017-04-07 15:13
off to try the quickstart.

greg
2017-04-07 15:13
See what happens . :slightly_smiling_face: I'll be here

2017-04-07 15:13
<3

2017-04-07 15:14
I love the working name "Rocket Skates," btw.

2017-04-07 15:14
Very Wile E. Coyote.

greg
2017-04-07 15:15
That is Victor. I just renamed it last night to be more aligned. :disappointed: still hidden in places though

2017-04-07 15:34
If all my hosts already have public and private networks, then DHCP is for networks that DR will create?

2017-04-07 15:35
--con-dhcp ^

greg
2017-04-07 15:35
well - in this case, more thought is required.

greg
2017-04-07 15:35
To provision the machines, you would need to PXE somehow. Is there a DHCP server already on the internal net.

2017-04-07 15:36
None that I cant switch off.

2017-04-07 15:36
Nope, none.

greg
2017-04-07 15:36
Do the machines PXE on that net?

2017-04-07 15:36
Nope.

greg
2017-04-07 15:36
hmm

2017-04-07 15:37
I already have them all setup with Centos7, networked, all manual.

2017-04-07 15:37
Well, I setup pxe for a little bit, but shut it all down.

greg
2017-04-07 15:37
Soooo - you could set up like a cloud

2017-04-07 15:38
They're fat boxes. Lots of cores, ram, disk.

greg
2017-04-07 15:38
Yeah

greg
2017-04-07 15:39
Ok- so if you look at deploy/add-from-ssh.sh

2017-04-07 15:40
yes

2017-04-07 15:40
I've read through it all.

greg
2017-04-07 15:41
okay the scripts/join_rebar.sh has what you want to do, but it hasn't been used for a while.

greg
2017-04-07 15:41
so it is stale.

2017-04-07 15:42
I see.

2017-04-07 15:42
A freshening would be work.

greg
2017-04-07 15:42
First obvious thing is remove :3000 from everything.

2017-04-07 15:43
Oy, it assumes rebar1

greg
2017-04-07 15:43
well - yeah. -- sorry.

2017-04-07 15:44
:shipit:

2017-04-07 15:44
the quickstart was upset that my internal network is 192.168.1.0 it really wants 99

greg
2017-04-07 15:45
yeah - though it turns out you don't need it because you don't need DHCP

2017-04-07 15:46
What's the "passed in IP?" the node to be added?

greg
2017-04-07 15:47
The IP of the machine itself. This is for multihomed hosts or cloud hosts

greg
2017-04-07 15:47
cloud hosts don't know their public ips in all cases, but need to be registered with it.

2017-04-07 15:47
Ah, where the proper IP cannot be assumed.

2017-04-07 15:48
behind, NAT

greg
2017-04-07 15:48
yes

greg
2017-04-07 15:48
or for AWS they don't know their public ip, but it is the only path in.

2017-04-07 15:49
Sure.

2017-04-07 15:49
I killed all the containers and re-ran the quickstart.

2017-04-07 15:49
Now it's upset that I changed code in the repo. :-)

2017-04-07 15:50
holy cow, SMB is installed?!

greg
2017-04-07 15:51
windows install support.

2017-04-07 15:51
Seems kinda the wrong thing to expose on every interface by default.

greg
2017-04-07 15:52
probably a good request. We are mostly ending up at most places a single interface box that handles this stuff with forwarders to it.

2017-04-07 15:52
understood

2017-04-07 15:52
I'll roll with it for now. :-)

2017-04-07 16:03
running ./join_rebar.sh 192.168.1.10 192.168.1.10 fails

2017-04-07 16:03
I wanna add the admin node as a workload destination. But it seems like a dumb idea, because rebar's taking up so many ports. Should prolly be a VM.

greg
2017-04-07 16:04
yeah

2017-04-08 12:42
I'm trying to make a nice VM for dainty little rebar, but I'm having trouble routing out from the VM. HALP, my head is getting bloody from banging: http://stackoverflow.com/questions/43294216/bridged-libvirt-vms-cannot-route-out

2017-04-08 14:21
Nevermind - seems like I do need the NAT.

2017-04-09 21:41
If I don't see and DHCP subnets in my GUI, did install fail or something?

2017-04-09 22:01
how do I run the role to install raid-tools?

2017-04-09 22:05
Oh, nevermind. :-) I guess I need some nodes first.

2017-04-09 22:58
The hint at the quickstart should probably say if my_key.key should be the public or private key.

2017-04-09 22:59
If I mkdir -p the director path to the keyfile to pre-seed, then quickstart fails on "directory detected"

2017-04-09 23:16
during install, rule-engine:master goes up and down a lot... expected? a793838d8e11 digitalrebar/rule-engine:master "/sbin/docker-entr..." 2 minutes ago Restarting (1) 12 seconds ago compose_rule-engine_1

2017-04-10 01:35
:-)

2017-04-10 01:36
I've got VMs and Metal all booting off my Rebar on a VM.

greg
2017-04-10 02:08
Nice!

2017-04-10 02:09
Yes, happy dance.

2017-04-10 02:10
But I don't have any BMC ports on my metal plugged in.

2017-04-10 02:10
no ipmi.

2017-04-10 02:10
Will rebar fail over to logging into sledgehammer and rebooting?

greg
2017-04-10 02:15
Hmmm - I'm not sure. I think it will attempt to talk. Did you create a BMC network? It may not because there isn't a network to use.

2017-04-10 02:27
Good work @newgoliath !

2017-04-10 02:29
Thanks, @zehicle .

2017-04-10 02:30
@galthaus - I didn't create a BMC network.

2017-04-10 02:36
doing an 'Install OS' just gives a false positive in the Deployment. Sledgehammer never reboots.

2017-04-10 02:37
does it think the machine if off?

2017-04-10 02:37
nope

2017-04-10 02:38
On ready

2017-04-10 02:38
I suspect it's trying to use the BMC to reboot and fails. so it knows the machine is not off

2017-04-10 02:39
I'm not sure where hammer errors would get surfaced. I suspect if you try to power off from the UX or API then you'll get an error message

2017-04-10 02:40
the reinstall combines a few steps together. it's possbile that we're eating the error

2017-04-10 02:43
No response to a "cycle" click.

2017-04-10 02:45
rebooting the sledgehammer via ssh

2017-04-10 02:46
I've got a serial console on it, watching it boot.

2017-04-10 02:46
for your configuration you may need to manually set the nodes to use the SSH hammer.

2017-04-10 02:47
or mess w/ the hardware profiles so that they do not recognize your gear as a known type

2017-04-10 02:52
too bad sledgehammer doesn't send output to the serial console by default. I'd love to watch it boot.

2017-04-10 02:55
rebooting sledgehammer via ssh returns the node up, but no ssh.

2017-04-10 02:56
there's interaction between the provisioner & hammer so that it knows that you've started a reboot cyckle

2017-04-10 02:56
unless nextboot changes, you'll keep going back to sledgehammer

2017-04-10 02:57
sledgehammer is up - but ssh port is closed.

2017-04-10 02:57
host pings ok

2017-04-10 03:00
If I destroy the deployment and send it back to system, you think it will open up ssh?

2017-04-10 03:00
I have no idea - very strange for the image to have ssh

2017-04-10 03:01
It's always reaching back to rebar for instructions, right?

2017-04-10 03:15
@zehicle how do I change to the SSH hammer?

2017-04-10 03:16
I think you can nodes update X set {"hammer":"ssh"} or similar.... checking

2017-04-10 03:22
@zehicle interestingly, it's got "quirks ipmi-dell-dedicated-nic and ipmi-nodelay"

2017-04-10 03:29
hammer is not an attrib.

2017-04-10 04:17
sorry, it's actually a subtype nodes/#/hammers

2017-04-10 04:17
or just api/v2/hammers

2017-04-10 04:17
I think you can delete the offending hammer(s)

2017-04-10 04:18
leave SSH in place (assuming it's there) - @VictorLowther will have better suggestions

2017-04-10 04:19
I seem to have changed it, because there's a whole new complaint:

2017-04-10 04:20
rebar-access role: RuntimeError: Did not create remote_tmpdir on da4-ba-db-3e-93-ac.local.neode.org for some reason! (ssh: connect to host 192.168.1.110 port 22: Connection refused )

zehicle
2017-04-10 04:22
could be a template generation problem

zehicle
2017-04-10 04:23
from the provisioner - these things are all connected together

2017-04-11 17:46
All, we've got the Digital Rebar Provision code ready for community feedback! We'd love for you all to take a look and give it a try. It's just DHCP/PXE intended as a cobbler replacement that will be the v3 DR provisioner.

2017-04-11 17:46
https://robhirschfeld.com/2017/04/11/provision-preview/

wdennis
2017-04-12 22:06
Hey @zehicle - what needs to be done with Provision to get it to use an outboard DHCP server? Same stuff as Cobbler (set relevant next host etc?)

greg
2017-04-12 22:30
Yes - You need to have your DHCP point to the IP of provision as the next server. Specify option 67 as lpxelinux.0 (bootfile)

greg
2017-04-12 22:31
You cna then use --disable-dhcp on the command line when you run dr-provision

2017-04-12 23:09
http://digital-rebar.readthedocs.io/en/latest/deployment/old/external-services.html ctrl+f pxelinux

2017-04-12 23:09
Should that be lpxelinux.0?

2017-04-12 23:10
since I don't see pxelinux.0 in `./.cache/digitalrebar/tftpboot/` and nothing in digitalrebar/digitalrebar that indicates it's used

2017-04-12 23:11
I do notice that discovery is just a symlink to its parent directory though, does it matter if it's specified in the dhcp server's config?

greg
2017-04-12 23:16
difference between dr-provision and digitalrebar.

greg
2017-04-12 23:17
dr-provision's tftpboot directory has it all in the top directory.

2017-04-12 23:18
those are the digitalrebar docs aren't they? I'm using digitalrebar/digitalrebar, not digitalrebar/provision

greg
2017-04-12 23:19
Sorry - Thought you were referring to @wdennis above.

greg
2017-04-12 23:20
@Iae - it does not matter, I think. They both reference the same layout.

greg
2017-04-12 23:20
The main thing is that lpxelinux.0 can get to the pxelinux.cfg/default file or appropriate per node config files

2017-04-12 23:21
ok, but lpxelinux.0 has to be specified in the dhcp server config and not pxelinux.0

greg
2017-04-12 23:29
Yes - the DHCP server needs to have the EXTERNAL_IP of DigitalRebar as the next server in the DHCP response and option 67 (bootfile) needs to be lpxelinux.0 (for legacy BIOS boots).

greg
2017-04-12 23:30
The digitalrebar dhcp server and dr-provision use go template expansion to inspect the incoming packet to determine what file should be sent. This can be seen in the UI if you configure an admin network.

greg
2017-04-12 23:30
Anyway, dinner and soccer for a few hours.

wdennis
2017-04-13 16:41
So, looking to trial Provision on the same server that I have full DR running on -- I've downed DR by running 'docker-compose stop' in the deploy/compose dir - that's all that's needed to bring DR down, correct?

greg
2017-04-13 16:46
yes

wdennis
2017-04-13 16:50
Cool. Now I can curlbash the Provision into the same 'digitalrebar' directory, or should I use another dir above that?

greg
2017-04-13 16:51
different dir

greg
2017-04-13 16:52
use the --isolated flag.

greg
2017-04-13 16:52
it will keep everything in that directory for now.

wdennis
2017-04-13 16:52
Cool, thx

greg
2017-04-13 16:53
np

wdennis
2017-04-13 16:55
Getting a 'No package p7zip-full available' error

greg
2017-04-13 16:55
what OS?

wdennis
2017-04-13 16:55
This on CentOS 7.x

greg
2017-04-13 16:55
try: yum install -y p7zip

wdennis
2017-04-13 16:56
Yup, that worked

greg
2017-04-13 16:56
sigh - it is p7zip-full on ubuntu/debian

greg
2017-04-13 16:56
just p7zip on centos

greg
2017-04-13 16:56
fixing now.

wdennis
2017-04-13 16:57
Love this differing pkg names across distros thing :-/

greg
2017-04-13 16:58
yeah- I did a replace and was too aggressive.

greg
2017-04-13 16:59
in about 15 minutes, all the packages and tree will be updated. It will work. Since you already have it installed now, it should be fine to continue.

wdennis
2017-04-13 16:59
How to continue?

greg
2017-04-13 16:59
Rerun

wdennis
2017-04-13 16:59
Ok

greg
2017-04-13 16:59
It tests for the package and skips if already there.

greg
2017-04-13 16:59
the binary that is.

wdennis
2017-04-13 17:04
Is the '--data-root=' line in readthedocs QuickStart incorrect now?

greg
2017-04-13 17:05
It should be. The install script should have kicked out an example.

wdennis
2017-04-13 17:05
References "discovery-load.sh" instead of "digitalrebar"

greg
2017-04-13 17:05
Line right above it

wdennis
2017-04-13 17:06
It did, and I went with the outputted example

greg
2017-04-13 17:06
okay so is it running? :slightly_smiling_face:

greg
2017-04-13 17:06
export RS_KEY=rocketskates:r0cketsk8ts

greg
2017-04-13 17:06
./drpcli users list

greg
2017-04-13 17:07
./drcpli prefs list

wdennis
2017-04-13 17:07
Yes running

greg
2017-04-13 17:07
Should be able to do:

greg
2017-04-13 17:07
in browser

greg
2017-04-13 17:08
https://<ip>:8092/ui/?token=rocketskates:r0cketsk8ts

wdennis
2017-04-13 17:08
Yup dricli cmds work

wdennis
2017-04-13 17:08
*drpcli

greg
2017-04-13 17:08
then run: tools/discovery-load.sh

greg
2017-04-13 17:09
That will put the pieces in place to do discovery.

wdennis
2017-04-13 17:09
Cool

greg
2017-04-13 17:09
Did the browser thingee work?

wdennis
2017-04-13 17:10
Hold on, let me try

wdennis
2017-04-13 17:12
No, URL not working

wdennis
2017-04-13 17:13
I do see ports 8091/8092 listening on server

greg
2017-04-13 17:13
https://<ip>:8092/swagger-ui

wdennis
2017-04-13 17:14
Nope

wdennis
2017-04-13 17:15
Firewall maybe? Will nmap DR server

wdennis
2017-04-13 17:15
Yup, only SSH port open...

greg
2017-04-13 17:16
okay - are these local firewall rules on your box?

greg
2017-04-13 17:16
Err - dr server?

wdennis
2017-04-13 17:17
Yup - firewalld

greg
2017-04-13 17:17
okay - Note to self update docs to talk about open port reqs.

greg
2017-04-13 17:17
I indirectly list them in the docs, but ...

wdennis
2017-04-13 17:17
Stopped that, will try again

greg
2017-04-13 17:18
you will need to open ports, udp 67, udp 69, tcp 8091, tcp 8092

greg
2017-04-13 17:18
The last two are configurable (well all are configurable, but tftp and dhcp WIGGGGG OUTTTTT if you change then from 69 and 67 respectively).

wdennis
2017-04-13 17:19
Ok, tried swagger, got UI but error on screen "Can't read from server. It may not have the appropriate access-control-origin settings."

greg
2017-04-13 17:19
sigh - need to adjust that. The text box has a URL it defaults to localhost or something like. Put the IP of the Host in that field instead.

wdennis
2017-04-13 17:20
Regular (?) UI page working

greg
2017-04-13 17:20
be sure to add ?token=rocketskates:r0cketsk8ts

wdennis
2017-04-13 17:21
Ok, in

greg
2017-04-13 17:21
There should have been a little dialog for you to enter your username/password or token.

greg
2017-04-13 17:21
It is small. This is working in progress.

wdennis
2017-04-13 17:22
Do I have to declare a subnet if not doing DHCP thru Provision?

greg
2017-04-13 17:22
oh -- yeah.

greg
2017-04-13 17:22
you should be okay.

greg
2017-04-13 17:22
Make sure preferences are set to sledgehammer for default and discovery for unknown.

wdennis
2017-04-13 17:22
So, don't have to, right?

greg
2017-04-13 17:23
correct.

greg
2017-04-13 17:23
You need to have next server point to your IP and bootfile should be lpxelinux.0

wdennis
2017-04-13 17:23
Yup, SH in defaultBootEnv

greg
2017-04-13 17:24
in the bootenvs section discovery, local, and sledgehammer should be available.

wdennis
2017-04-13 17:24
Let me mod my routers DHCP settings

greg
2017-04-13 17:24
Does your DHCP server respond with DNS settings like Domain Name and DNS server?

wdennis
2017-04-13 17:25
Nothing in 'local' showing...

greg
2017-04-13 17:25
Yeah - it should be just available.

wdennis
2017-04-13 17:26
Yes, DHCP server does provide those (it's a pfSense box.)

greg
2017-04-13 17:26
okay - then you are even betterer

greg
2017-04-13 17:26
Another note to self - document external DHCP server case.

greg
2017-04-13 17:26
We need DNS domain name when we make up a name.

greg
2017-04-13 17:30
so - I'm walking you through discovery, but you can also add machine directly as well.

greg
2017-04-13 17:30
If you know that a machine is coming in as an IP. You can create a machine by IP and tell it the bootenv to run when contacted. That works as well.

wdennis
2017-04-13 17:32
Ok, time to boot some bare metal and see what happens :)

greg
2017-04-13 17:33
okay then :slightly_smiling_face:

wdennis
2017-04-13 17:33
When sledgehammer runs, where does it store the discovery info?

greg
2017-04-13 17:34
Okay - so dr-provision is much slimmer. Its purpose is just provisioning and installing OS. It is not meant to inventory and so forth. That is the next step in our integration path. Provision will hook into DigitalRebar for that path.

greg
2017-04-13 17:35
It will create a machine in dr-provision, but it won't have the full inventory data like DR does.

wdennis
2017-04-13 17:35
Ah, I see

wdennis
2017-04-13 17:37
So am waiting on PXE boot now... Dell PE R510 boot is sloooooow

2017-04-13 17:37
Time to feed the :bear:!

wdennis
2017-04-13 17:39
Loading sledgehammer/.... :)

wdennis
2017-04-13 17:40
Hmmm... failed on loading second-stage initramfs

wdennis
2017-04-13 17:41
wget: can't connect to remote host (IP): connection refused

greg
2017-04-13 17:41
Does it say what ip?

greg
2017-04-13 17:41
Also need another param


greg
2017-04-13 17:42
Did you start drprovison with static ip param

wdennis
2017-04-13 17:43
Yes

greg
2017-04-13 17:43
Is 1.148 the drop

wdennis
2017-04-13 17:43
Yes, that's the Provision svr

greg
2017-04-13 17:44
okay

greg
2017-04-13 17:44
why :80? hmm


greg
2017-04-13 17:48
okay in can you cat the tftpboot/pxelinux.cfg/default file

wdennis
2017-04-13 17:50
Does not exist

greg
2017-04-13 17:50
okay - we should have default. That is what is being strange, I think.

greg
2017-04-13 17:51
looking at it now.


greg
2017-04-13 17:52
yeah - I recreated it here. I guess we just broke something. Recently - More unit tests to add.

wdennis
2017-04-13 17:52
Always more tests :)

greg
2017-04-13 17:57
try this:

greg
2017-04-13 17:57
./drpcli prefs set unknownBootEnv ignore

greg
2017-04-13 17:57
./drpcli prefs set unknownBootEnv discovery

greg
2017-04-13 17:58
See if that creates the file.

wdennis
2017-04-13 18:00
Nope

greg
2017-04-13 18:00
ok


greg
2017-04-13 18:18
I'm silly - it isn't rendered that way anymore .

greg
2017-04-13 18:18
sigh


greg
2017-04-13 18:24
It think I see the busted-ness. We switched to inline rendering and it isn't working on a shift.

greg
2017-04-13 18:24
We don't use the filesystem for content anymore. Should add preview for bootenvs and templates.

greg
2017-04-13 18:26
NOtes So far: 1. Document Port requirements for DRP 2. Document DHCP requirements when not using DRP as DHCP (option 15 (discovery), bootfile, nextserver). 3. Bug in changing unknown bootenv and rendering content. Changed ignore to discovery and ignore is still being served.

wdennis
2017-04-13 18:30
So, at this moment, busted & cant continue?

greg
2017-04-13 18:32
hmm - try this.

greg
2017-04-13 18:33
``` ./drpcli prefs list ./drpcli prefs set unknownBootEnv ignore curl http://127.0.0.1:8091/pxelinux.cfg/default ./drpcli prefs list ./drpcli prefs set unknownBootEnv discovery curl http://127.0.0.1:8091/pxelinux.cfg/default```

greg
2017-04-13 18:44
The discovery file should look like this: ``` DEFAULT discovery PROMPT 0 TIMEOUT 10 LABEL discovery KERNEL sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/vmlinuz0 INITRD sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/stage1.img APPEND rootflags=loop root=live:/sledgehammer.iso rootfstype=auto ro liveimg rd_NO_LUKS rd_NO_MD rd_NO_DM provisioner.web=http://127.0.0.1:8091 rs.api=https://127.0.0.1:8092 IPAPPEND 2 ```

greg
2017-04-13 18:45
The 127.0.0.1 is because my request came from 127.0.0.1

greg
2017-04-13 19:00
I'm now wondering if the tftp error was an issue. You could also make sure that your pref is discovery and reboot.

greg
2017-04-13 19:00
See if we hit the tftp error again.

wdennis
2017-04-13 19:33
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F4ZP4RLRM/image_uploaded_from_ios.jpg and commented: Looks good?

greg
2017-04-13 19:35
yeah if provisioner.web=http://127.0.0.1:8091 on the second to last line

wdennis
2017-04-13 19:36
Should that not be the public IP?

greg
2017-04-13 19:36
We build the reply based upon the dest IP.

greg
2017-04-13 19:36
run it with your node's iP.

greg
2017-04-13 19:36
curl http://node's IP:8091/pxelinux.cfg/default

greg
2017-04-13 19:37
it should pop out with a different IP.

wdennis
2017-04-13 19:40
Node being the Provision server, or the client server?

greg
2017-04-13 19:40
provisioner server

wdennis
2017-04-13 19:41
Ok, did change the URL

greg
2017-04-13 19:42
Reboot the node and look for the tftp error in the drp log

wdennis
2017-04-13 20:07
The PXE client boot did fail again in the same way- that was expected?

greg
2017-04-13 20:29
hmm - I guess it is good, but was hoping not.

greg
2017-04-13 20:34
sorry for the delay. @wdennis - can you run a command on the system stuck in ash

greg
2017-04-13 20:34
cat /proc/cmdline

greg
2017-04-13 20:34
grep -o 'provisioner.web=[^ ]*' /proc/cmdline

wdennis
2017-04-13 20:36
Will do when I can in a bit - taking kidddo to karate :)

greg
2017-04-13 20:36
np

wdennis
2017-04-14 00:31

wdennis
2017-04-14 00:32
here you go, @greg

greg
2017-04-14 00:35
Thx. If you added /24 to the command line of drprovison please stop the server remove it and restart.

wdennis
2017-04-14 00:38
I did? will do.

greg
2017-04-14 00:42
Another thing to add to the checks on start up :slightly_smiling_face:

wdennis
2017-04-14 00:46
did the curlbash output say to put ?/24? at the end of the IP, or did I just goof?

wdennis
2017-04-14 00:46
readthedocs site does not have that

greg
2017-04-14 00:59
It should not. The readthedocs should not have it either. I bet it might have. My bad.

wdennis
2017-04-14 01:03
Unfortunately, I typed in something in ash that caused it to coredump? And I don?t have remote power control on that server I was using :disappointed:

wdennis
2017-04-14 01:04
hangs head in shame

wdennis
2017-04-14 01:14
This look like the problematic bit in `install.sh`:

wdennis
2017-04-14 01:14
```[dradmin@dr-admin ~]$ /sbin/ip -o -4 addr show scope global |head -1 |awk '{print $4}' 192.168.1.148/24```

greg
2017-04-14 01:14
Yeah I lifted to much code. Argh.

wdennis
2017-04-14 01:15
That?s why the community needs to test :slightly_smiling_face:

greg
2017-04-14 01:16
Well thanks and sorry.

wdennis
2017-04-14 01:16
np

wdennis
2017-04-14 01:17
Unfortunately unless I run into work (~15 min drive) the game?s over until Mon? We?ll see what the wife thinks?

wdennis
2017-04-14 01:18
There?s a bit of other work (wiring work) I could stand to do there?

greg
2017-04-14 01:18
I'm in no hurry. Marital harmony is important.

wdennis
2017-04-14 01:19
yes it is :slightly_smiling_face:

wdennis
2017-04-14 01:20
So may I ask what?s involved with installing a distro image and doing kickstart/preeseed?

greg
2017-04-14 01:20
Cd assets

greg
2017-04-14 01:21
Drpcli bootenvs install bootenvs/ubuntu16.04.yml

greg
2017-04-14 01:21
Wait awhile

greg
2017-04-14 01:22
Then change machines bootenv to Ubuntu. And then reboot node

greg
2017-04-14 01:22
Wait Awhile. When machine bootenv goes to local done

greg
2017-04-14 01:22
The yemplates and bootenvs work mostly like DR

greg
2017-04-14 01:23
Please forgive phone typing

greg
2017-04-14 01:24
Examples in assets dir


wdennis
2017-04-14 13:47
And I see the machine in the UI

wdennis
2017-04-14 13:50
API Help links in the UI are broken...


greg
2017-04-14 13:54
Yeah!!!!!

2017-04-14 13:55
I'll fix those links! Thanks.

wdennis
2017-04-14 14:14
Now into the Ubuntu 16.04 install...

wdennis
2017-04-14 14:19
Looks like we have a successful installation :)

greg
2017-04-14 14:19
Nice!

wdennis
2017-04-14 14:20
When the machine boots post-install, does it communicate back to Provision and set BootEnv to be 'local'?

greg
2017-04-14 14:20
yes

greg
2017-04-14 14:21
no - local is set as last step of install in ks or preseed

greg
2017-04-14 14:21
ignore the yes

wdennis
2017-04-14 14:22
Yup, I see local now

wdennis
2017-04-14 14:23
Cool

greg
2017-04-14 14:23
I think for ubuntu - rocketskates/r0cketsk8ts

greg
2017-04-14 14:23
default login.

wdennis
2017-04-14 14:23
Not the 'rebar' one?

greg
2017-04-14 14:24
I think I changed it.

greg
2017-04-14 14:24
You can change it by add params (either to the machine or globally).

greg
2017-04-14 14:24
Each bootenv defines a set of parameters that can be injected.

greg
2017-04-14 14:24
some are required and some are optional. These are optional (with defaults).

greg
2017-04-14 14:25
You can specify the user (for ubuntu) and the password hash.

greg
2017-04-14 14:25
Those are next on my doc list.

wdennis
2017-04-14 14:25
Where to set param's?

greg
2017-04-14 14:25
drpcli params list

greg
2017-04-14 14:25
those are global (it is empty by default).

greg
2017-04-14 14:26
drpcli machines update <uuid> <json blob of parameters>

greg
2017-04-14 14:26
I should make that better one day soon.

wdennis
2017-04-14 14:26
Cannot seem to log in with the rocketskates user/passwd (or the rebar one)

greg
2017-04-14 14:27
rebar/rebar1

greg
2017-04-14 14:27
:slightly_smiling_face:

greg
2017-04-14 14:27
I don't remember - I'll have to check.

wdennis
2017-04-14 14:28
Nope :)

greg
2017-04-14 14:28
What stupid thing did I do.

greg
2017-04-14 14:29
rocketskates is the user

greg
2017-04-14 14:29
rocketskates

greg
2017-04-14 14:30
r0cketsk8ts

wdennis
2017-04-14 14:31
Does not work...

greg
2017-04-14 14:31
okay sigh arg

greg
2017-04-14 14:32
Time to fix it and put it in the docs.

greg
2017-04-14 14:37
Just a second


wdennis
2017-04-14 14:37
Where do I see the .Param 's used?

greg
2017-04-14 14:39
in this case, it is unset and uses the default in the file.

greg
2017-04-14 14:40
```./drpcli params create '{ "Name": "provisioner-default-password-hash", "Value": "$6$5trJ0SAGobo9YE1X$r31iOEokINeaYTCbtyfpCZU6wgKoK7Tr3mc7CIFl7hDPP6LNPkUveg3hB2vasE.H5IBwvW5qBK8aQz5imnp8J0" }'```

greg
2017-04-14 14:40
That will create a new global parameter which represents r0cketsk8ts

greg
2017-04-14 14:40
You could try: R0cketSk8ts

greg
2017-04-14 14:41
Anyway, the parameter call will set the global.

wdennis
2017-04-14 14:41
Nada

greg
2017-04-14 14:42
then do: drpcli machines update <uuid> '{ "BootEnv": "ubuntu-16.04-install" }'

greg
2017-04-14 14:42
reboot the node and then log in when it is done.

greg
2017-04-14 14:43
Wait

wdennis
2017-04-14 14:43
Oh that'll make the node do a reinstall?

greg
2017-04-14 14:43
RocketSkates is the passsword

greg
2017-04-14 14:43
Yes

wdennis
2017-04-14 14:43
Yuuuuup :)

greg
2017-04-14 14:43
Okay - I'm gonna doc that - it was in our git commit log history.

wdennis
2017-04-14 14:44
Lol

wdennis
2017-04-14 14:46
OK, this is good... Now, how to spec differing preseed templates for different nodes?

wdennis
2017-04-14 14:47
We have groups that want specific disk partitioning, etc

greg
2017-04-14 14:48
YOu need custom bootenvs for each one.

greg
2017-04-14 14:48
So you would create template for each type/group - then a bootenv the refs that template. Then set node's bootenv to that bootenv.

wdennis
2017-04-14 14:49
I try to do as little as possible in kickstart/preseed, and really config node via Ansible, but disk partitioning isn't something that can. E done that way :)

greg
2017-04-14 14:49
Yep - our philosophy as well.

wdennis
2017-04-14 14:50
Any ideas about pulling in config blocks within templates?

greg
2017-04-14 14:50
Well , two thoughts. One is parameters with inject.

wdennis
2017-04-14 14:50
Like you can still use the same basic template, but the disk partitioning can be a sub-template if you will...

greg
2017-04-14 14:51
In theory, the templates can contain other templates (from a golang template spec perspective, but I don't think we've hooked that together). So one day that way.

wdennis
2017-04-14 14:51
Also would handle multi-disk setups

greg
2017-04-14 14:51
Today, you could build a bootenv with a big string explosion and put that as a parameter. The parameter could then be injected by node type.

wdennis
2017-04-14 14:53
Define "big string explosion" please :)

wdennis
2017-04-14 14:54
I'd hate to have to make/maintain 'n' templates that are basically 90%+ same, with just minor changes

greg
2017-04-14 14:55
give me a second

wdennis
2017-04-14 14:55
Cobbler has the concept of 'snippets' that are config blobs that can be put into a static template kickstart

greg
2017-04-14 14:56
Yeah - template in template will be the way of that.

greg
2017-04-14 14:56
We just aren't quite there yet.

greg
2017-04-14 14:57
Instead you would need this:

greg
2017-04-14 14:58
Say I had this in my preseed:

greg
2017-04-14 14:58
```d-i clock-setup/ntp boolean false {{end}} {{if .ParamExists "operating-system-disk"}} d-i partman-auto/disk string {{ .Param "operating-system-disk" }} {{else}} d-i partman-auto/disk string /dev/sda {{end}} d-i partman-auto/method string lvm d-i partman-lvm/device_remove_lvm boolean true d-i partman-lvm/device_remove_lvm_span boolean true d-i partman-auto/purge_lvm_from_device boolean true d-i partman-md/device_remove_md boolean true d-i partman-lvm/confirm boolean true d-i partman-lvm/confirm_nochanges boolean true d-i partman-lvm/confirm_nooverwrite boolean true d-i partman-auto-lvm/guided_size string max d-i partman-auto-lvm/new_vg_name string {{ .Machine.ShortName }} d-i partman-auto/choose_recipe select custom_lvm d-i partman/confirm_write_new_label boolean true d-i partman/choose_partition select finish d-i partman/confirm boolean true d-i partman/confirm_nooverwrite boolean true d-i partman/auto expert_recipe string \ custom_lvm:: \ 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \ 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \ 10240 20 10240 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ {{ .Machine.ShortName }} } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ {{ .Machine.ShortName }} } method{ swap } format{ } . {{if (and (eq "ubuntu" .Env.OS.Family) (lt "12.10" .Env.OS.Version))}} d-i live-installer/net-image string {{.Env.InstallUrl}}/install/filesystem.squashfs```

greg
2017-04-14 14:58
I could replace it with this:

greg
2017-04-14 15:00
```d-i clock-setup/ntp boolean false {{end}} {{ .Param "my-disk-layout-string" }} {{if (and (eq "ubuntu" .Env.OS.Family) (lt "12.10" .Env.OS.Version))}} d-i live-installer/net-image string {{.Env.InstallUrl}}/install/filesystem.squashfs```

greg
2017-04-14 15:00
Then do:

greg
2017-04-14 15:01
./drpcli params create '{ "Name": "my-disk-layout-string", "Value": "custom stuff here - no params though" }'

greg
2017-04-14 15:01
That would be the default

greg
2017-04-14 15:01
Then for each machine that needed custom from default.

greg
2017-04-14 15:02
./drpcli machines update <uuid> '{ "Params": { "my-disk-layout-string": "other custom form" } }'

greg
2017-04-14 15:02
This is giving me some good ideas.

greg
2017-04-14 15:02
for some features.

greg
2017-04-14 15:03
make those commands easier

greg
2017-04-14 15:03
template in template

greg
2017-04-14 15:03
docs

wdennis
2017-04-14 15:07
Totally should work on that Tmpl-in-tmpl stuff :)

wdennis
2017-04-14 15:08
And all this work will port into future DR?

greg
2017-04-14 15:09
Yeah - eventually, the plan is to have this replace the DHCP/Provisioner in DR with a real provider.

greg
2017-04-14 15:09
The hope is to start with this to get moving - it should be simpler to start using and it will work on multiple sites without the back haul inet perf/bandwidth hit.

greg
2017-04-14 15:10
Local control and all that.

wdennis
2017-04-14 15:10
It's the gateway drug for full-on DR ;)

greg
2017-04-14 15:11
well - we dream

wdennis
2017-04-14 15:32
So the templates are Golang "text/template" ?

greg
2017-04-14 15:39
yes

greg
2017-04-14 15:43
@wdennis - you want something like:

greg
2017-04-14 15:44
{{ .Template (.Param "disk-template") }}

greg
2017-04-14 15:44
Where .Param would be a string parameter that is the name of a template to include in this template.

wdennis
2017-04-14 15:45
Can the resulting master template be rendered to see what it will produce? (Not a Go dev...)

greg
2017-04-14 15:45
Yeah - that was another idea that I was thinking about. A template preview option.

greg
2017-04-14 15:46
bootenv preview option as well.

wdennis
2017-04-14 15:46
Great idea

greg
2017-04-14 15:50
You can open issues in github for things you see as enhancements as well. I'm doing these now.

wdennis
2017-04-14 15:50
Yup, will do

greg
2017-04-14 16:04
okay - I think I created the ones that we talked about.

2017-04-14 18:57
Is there public documentation for connecting the docker dhcp container to the "Real" network so it can get and response to helper requests hitting the routable ip already? Easy as just making docker bridge in to "eth0" on the host?

greg
2017-04-14 19:00
If you are using FORWARDER mode, yes, you can bridge the eth1 into docker0. Then the 192.168.124.0/24 network will be served on that interface as well.

2017-04-14 19:01
Running host mode in theory. forwarder simpler?

greg
2017-04-14 19:01
If you are using HOST mode, the DHCP server is already listening on the interfaces. You may need to create a specific admin network for that interface.

2017-04-14 19:01
I see the host binds on the host

2017-04-14 19:01
udp 0 0 192.168.122.1:53 0.0.0.0:_ 1400/dnsmasq udp 0 0 0.0.0.0:67 0.0.0.0:_ 1400/dnsmasq

2017-04-14 19:01
k

2017-04-14 19:01
specifically bind it to "ens3" ?

greg
2017-04-14 19:02
Actually, you just have to create the network in the UI or CLI.

greg
2017-04-14 19:03
with the proper ranges and it will just start listening on that interface.

greg
2017-04-14 19:03
That may not make complete sense.

2017-04-14 19:03
What does DR consider proper ranges? maybe thats the issue

2017-04-14 19:03
does it support dhcp-relay with "External" ranges?

greg
2017-04-14 19:03
Yes

greg
2017-04-14 19:04
It will handle both.

2017-04-14 19:04
hm. the default admin-internal scope is there, I'll fix it up

greg
2017-04-14 19:04
OR create a new one.

greg
2017-04-14 19:04
either way.

greg
2017-04-14 19:04
make sure to get the router at the bottom of the page

2017-04-14 19:05
Is "conduit" the key field that matters

2017-04-14 19:05
not really clear

2017-04-14 19:06
or enable bridge/put in ens3

greg
2017-04-14 19:07
conduit is an abstraction for that.

greg
2017-04-14 19:07
and it is for the client (not the server).

greg
2017-04-14 19:07
So, it is used to force the network to a specific interface on the client.

greg
2017-04-14 19:07
For admin, you can use the dhcp conduit.

greg
2017-04-14 19:08
it will put the network on the interface that dhcp booted/

2017-04-14 19:08
k

greg
2017-04-14 19:08
You can also specify 1g0 or 10g2 or ... a speed and an index.

greg
2017-04-14 19:08
1g0 = first 1g capable interface.

2017-04-14 19:11
Is there something to tickle to make the new network show up in dhcp subnets

greg
2017-04-14 19:12
refresh the UI. that page doesn't always reload.

greg
2017-04-14 19:12
Also. make sure you gave the category in the network as admin.

greg
2017-04-14 19:12
group should be something else.

2017-04-14 19:12
didn't :) changing

2017-04-14 19:13
shows now

greg
2017-04-14 19:13
:slightly_smiling_face:

2017-04-14 19:14
would the dhcp container be the appropriate place to do a tcpdump

greg
2017-04-14 19:15
in host mode, no. The dhcp container is a host networking container. So the host is sufficient.

greg
2017-04-14 19:15
you can also so the DHCP logs

2017-04-14 19:15
yea, those are clean/quiet

greg
2017-04-14 19:15
cd digitalrebar/deploy/compose

greg
2017-04-14 19:16
docker-compose logs -f dhcp

greg
2017-04-14 19:16
okay

2017-04-14 19:16
Looks like all api requets

2017-04-14 19:16
requests*

greg
2017-04-14 19:17
okay - so - we aren't seeing the DHCP requests.

greg
2017-04-14 19:17
hmm

2017-04-14 19:17
how do I verify this is host mode

greg
2017-04-14 19:17
tcpdump or the log

2017-04-14 19:17
this was my deploy cli ./run-in-system.sh --deploy-admin=local --access=host --admin-ip=10.62.7.140/26 --con-dhcp --con-provisioner

2017-04-14 19:17
but it still sat forever on the ansible forwarder task

2017-04-14 19:18
skipped host

greg
2017-04-14 19:18
host should be HOST

greg
2017-04-14 19:18
you are in forwarder mode. We have silly args.

2017-04-14 19:18
case check?

2017-04-14 19:18
lol k

2017-04-14 19:18
rerolling, that makes sense for what I'm seeing then :)

greg
2017-04-14 19:19
Yes, please.

2017-04-14 19:19
tx rob, will bbiab

greg
2017-04-14 19:19
really - this is Greg. :slightly_smiling_face: our slack to gitter app bounces through rob.

2017-04-14 19:20
AH woops lol

greg
2017-04-14 19:20
np - all good.

2017-04-14 19:29
dhcp_1 | 2017/04/14 19:28:47 Recieved DHCP packet: type Discover xid 0x8 ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 10.62.0.1 chaddr 70:10:6f:bc:4f:4c much bettah

2017-04-14 19:29
tx again

greg
2017-04-14 19:29
cool

2017-04-14 19:39
cool @crafty_house_twitter

2017-04-14 22:21
newly deployed digitalrebar system, working with crafty. we are not seeing the nodes added via pxe after dhcp subnet was created. any ideas. nothing in the logs but this:

2017-04-14 22:21
dhcp_1 | 2017/04/14 19:25:30 Recieved DHCP packet: type Discover xid 0xa1be8f5e ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 10.62.16.1 chaddr 9c:dc:71:4a:fa:c0 dhcp_1 | 2017/04/14 19:25:30 xid 0xa1be8f5e: Starting processing: 2017-04-14 19:25:30.948700271 +0000 UTC dhcp_1 | 2017/04/14 19:25:30 xid 0xa1be8f5e: Config lock acquired: 2017-04-14 19:25:30.948703801 +0000 UTC dhcp_1 | 2017/04/14 19:25:30 xid 0xa1be8f5e: relay from 10.62.16.1 (hops 1) dhcp_1 | 2017/04/14 19:25:30 Discover xid 0xa1be8f5e: No subnet for leases

2017-04-14 22:22
any ideas...

greg
2017-04-15 03:59
Is your admin network ranges include in 10.62.16.1?

greg
2017-04-15 03:59
@csayler

2017-04-15 22:11
Just try out the standalone Provision. Looks great.

2017-04-15 22:12
I was able to pxe boot a machine, now I was wondering if the machine should appear in machines list (drpcli machines list) ?

2017-04-16 08:29
@zehicle I'm trying to test drp and i blocked, when i try to write the user and passwd. i put rocketskates:RocketSkates And nothing happens. No more login.

2017-04-16 08:44
Sorry, it works with https://<ip>:8092/ui/?token=rocketskates:r0cketsk8ts .merci

greg
2017-04-16 11:52
@chilicat - what did you boot? Did you boot ignore? or Discovery? If discovery, then yes. If ignore, then no.

greg
2017-04-16 11:52
@moula - yeah - one is for dr-provision and one is for the machines that dr-provision installs.

greg
2017-04-16 11:53
More docs to come on how to change both of those.

2017-04-16 15:43
Thanks, go it working. Is it actually possible to trigger a reboot via cli of the machine?

2017-04-16 17:11
@zehicle Thank's .

zehicle
2017-04-16 18:36
@chillicat, in full DR yes. There is a node X reboot option. In DRP, no.

2017-04-16 20:59
@chilicat specially from the rebar cli -> "rebar nodes poweractions [node]" will give you the options

2017-04-16 20:59
then "rebar nodes power [node] [action]" will take that action. e.g.: reboot is the most common because it works even if we don't have OOB control.

2017-04-16 21:00
if you are looking to wipe and reconfigure the node, that's "rebar nodes redeploy [node]" or you can do a match of them using the deployment redeploy.

2017-04-17 14:01
@zehicle it's not possible to reboot a physical node after detect it, to do install system!!! i do it with maas and cloudinit.

2017-04-17 14:03
@zehicle another thing, if i shutdown my drp server, how i can be enable it automatically? thank you.

greg
2017-04-17 14:48
@moula - this is @galthaus - That is the point of dr-provision. It is simple and quick. DigitalRebar provides that level of integration today.

greg
2017-04-17 14:48
@moula - run the install.sh without the isolated flag and it will place start up scripts in place, put binaries in place, and show you the service commands to enable and start dr-provision to survive reboots.

wdennis
2017-04-17 14:49
DR team: in DRP, where can I set params for a host? I see there's a "OptionalParams" section in a given bootenv, but can these be set on a per-host basis?

greg
2017-04-17 14:49
@wdennis - they are only set on a per host or global level. I feel another helper coming on.

greg
2017-04-17 14:49
drpcli params is for manipulating global parameters

greg
2017-04-17 14:50
drpcli machines update <uuid> '{ json blob of params}' is for machine specific params

wdennis
2017-04-17 14:50
Thought so - if I do 'drpcli params list' I get empty set

greg
2017-04-17 14:51
Yes - params are free form.

greg
2017-04-17 14:51
They are deep dictionaries.

greg
2017-04-17 14:51
string -> Struct

greg
2017-04-17 14:51
where struct can be a json object, string, int, bool, ...

wdennis
2017-04-17 14:52
I think that params should be perhaps site-specific, apply to a group of hosts

greg
2017-04-17 14:52
For example, you can inject ssh keys by params. They need to be an array of objects. I'll be documenting this.

wdennis
2017-04-17 14:52
Like ntp_servers for example

greg
2017-04-17 14:52
@wdennis - that is where DR comes in. We are trying to give some function but level grouping and higher order ops to DR.

greg
2017-04-17 14:53
Well - long term vision is many dr-provision units local to regions to handle bandwidth and locality issues with a central DR driving them.

wdennis
2017-04-17 14:53
Ah

wdennis
2017-04-17 14:55
Any docs yet on providing params on a per-host basis?

greg
2017-04-17 14:56
of course not- I think we may be exposing a little too fast, but that is next on my plate. I have an internal thing I'm working on, but those docs and flows are next.

greg
2017-04-17 14:56
:slightly_smiling_face:

wdennis
2017-04-17 14:56
Trying to get to a point where I can substitute DRP for Cobbler in my environment, but missing some functionality (like profiles for groups of servers)

greg
2017-04-17 14:57
that is what we want and I need to help with that.

wdennis
2017-04-17 14:57
I could live with per-host for now

greg
2017-04-17 14:58
how do you specify the group membership and group scope?

greg
2017-04-17 14:59
need to drive for a bit. Back in a little while.

wdennis
2017-04-17 15:01
There would have to be something like profiles in DRP that combine a bootenv, and associated templates, and then instead of defining bootenv per host, can specify a template

wdennis
2017-04-17 15:03
Like for instance in my environment, I have many Ubuntu 16.04 profiles that only differ in things like disk partitioning, root user enabled or not, default user + passwd

wdennis
2017-04-17 15:03
95%+ of the OS install is the same, but these are crucial differences

wdennis
2017-04-17 15:05
(Maybe wrong here) in DRP, a bootenv is a single OS flavor installer map - don't want to have to have many of these to handle the >5% differences

wdennis
2017-04-17 15:06
Sorry for the tidal wave of txt here - do concentrate on driving :)

zehicle
2017-04-17 15:26
@wdennis the advanced workflow and system profiles are already in full DR. DRP is intended to be narrrowly scoped and then driven by higher level services.

greg
2017-04-17 15:32
Acutally, I think we need to document bootenvs better.

greg
2017-04-17 15:32
If we add template in template (with parameter drivers and examples), I think you have everything you need and more.

wdennis
2017-04-17 15:32
@zehicle It seems to me the value in full-on DR is to take advantage of the DR-provided orchestration of the full OS + software stack (which in something like K8s or OpenStack is quite complex.) What about use case of just install OS + use existing automation developed by end-user to get machines ready for further use (much simpler use case?)

greg
2017-04-17 15:32
The variable expansion into something more than just string replacement is pretty powerful.

wdennis
2017-04-17 15:34
Are bootenvs supposed to be like profiles?

greg
2017-04-17 15:36
bootenvs are intended to be a bootable environment that can be customized per machine.

greg
2017-04-17 15:37
This includes local disks, discovery, installation, coreos diskless, burn-in testing, whatever.

greg
2017-04-17 15:37
We intend the bootenvs to be customizable through parameters. The complexity of the templates contained in the bootenv and how they are tied to the parameters drive the complexity and the mutability of the bootenv.

greg
2017-04-17 15:38
Like you could have no params and hardcode an install. This would generate nxN bootenvs for your description.

greg
2017-04-17 15:39
Another level is to create a bootenv that takes a param that defines user info. The user info is an array of user data objects (object is uname, passhash, groups, ... whatever). Use golang template expanders to parse that list into user names.

greg
2017-04-17 15:39
The next level up is to put that into a template that preseed/kickstart template references.

greg
2017-04-17 15:40
At that point, you can set a variable on the machine that says complex_users= true and user_table=[ {}, {}, ... ]

greg
2017-04-17 15:40
This is what I need to spend time documenting.

2017-04-17 15:55
@zehicle I will try it, Merci beaucoup.

2017-04-17 15:57
@galthaus sorry . merci.

greg
2017-04-17 15:57
@moula - np.

wdennis
2017-04-17 20:13
Some DRP questions relating to host config: - How to spec initial user/passwd other than default "rocketskates" one - How to set static IP / mask / gateway - How to insert desired pubkey in root's .ssh/authorized_keys - If desired, how to set root password, and allow root access via SSH

wdennis
2017-04-17 20:13
Can these sorts of things be done?

greg
2017-04-17 20:15
well - yes. Some require more work than others. For example, static IP. dr-provision isn't managing address spaces beyond simple DHCP (and in your case not at all). The machine has an address field that can be referenced in a template to then set a static IP.

greg
2017-04-17 20:15
that way. You would have to specify the gateway and mask as new parameters.

wdennis
2017-04-17 20:47
@greg Are any of these handled thru just setting properties (params), or all take template changes?

greg
2017-04-17 20:48
most are already handled with parameters today

greg
2017-04-17 20:49
everything but static ip config is handled through parameters today.

greg
2017-04-17 20:49
and maybe root access via ssh.

wdennis
2017-04-18 01:09
@greg, any docs on setting the above params? (sorry, haven?t looked - point me to the URL if they exist plz)

greg
2017-04-18 01:19
I'm writing them tonight

wdennis
2017-04-18 01:24
Thanks - looking forward to implementing them :slightly_smiling_face:

greg
2017-04-18 06:25
@wdennis - check readthedocs with the latest (tip isn't working right). Sigh. http://provision.readthedocs.io/en/latest/doc/arch/data.html#template

greg
2017-04-18 06:26
More to come. Sorry. Hopefully tomorrow.

wdennis
2017-04-18 17:11
Don't be sorry - this looks great. Good docs take time (but do take the time! ;)

wdennis
2017-04-18 17:12
Time to get the lab together & start playing :)

greg
2017-04-18 17:27
More coming - objects mostly complete. Next up is operations and cli examples.

vlowther
2017-04-19 02:32
@wdennis templates including templates probably works now. Still need to write unit tests for it.

vlowther
2017-04-19 02:33
{{template "other" .}} Is the standard go text/template tag to use. https://golang.org/pkg/text/template/#pkg-overview

greg
2017-04-19 02:46
:slightly_smiling_face: And I need to remove the explicit line in our docs saying don't do it.

2017-04-19 23:01
``` labs-rebar:~/digitalrebar/deploy$ docker pull digitalrebar/logging:production Error response from daemon: manifest for digitalrebar/logging:production not found ``` I'm not seeing this tag on docker hub, and it looks like `run-in-system.sh` is failing when it tries to pull this

2017-04-19 23:06
never mind this

zehicle
2017-04-20 00:01
glad you got past it - let us know if you have other questions

2017-04-20 00:36
`Error: assignment to undeclared variable raw $scope.rawProfiles@https://labs-rebar.###/ux/bundle.min.js:39:8253 this.showEditNodeDialog@https://labs-rebar.###/ux/bundle.min.js:39:9149 anonymous/fn@https://labs-rebar.###/ux/bundle.min.js line 4 > Function:2:471 b@https://labs-rebar.###/ux/bundle.min.js:2:27522 Ic[b]</<.compile/</</e@https://labs-rebar.###/ux/bundle.min.js:5:4794 sf/this.$get</n.prototype.$eval@https://labs-rebar.###/ux/bundle.min.js:3:5260 sf/this.$get</n.prototype.$apply@https://labs-rebar.###/ux/bundle.min.js:3:5492 Ic[b]</<.compile/</<@https://labs-rebar.###/ux/bundle.min.js:5:4844 Pf@https://labs-rebar.###/ux/bundle.min.js:1:18758 Of/d@https://labs-rebar.###/ux/bundle.min.js:1:18707` I'm actually getting this error when I try to go edit a node

2017-04-20 00:38
as well as HSTS warnings but I don't think those are relevant

2017-04-20 00:53
I guess this is happening in firefox only

2017-04-20 00:53
surf/webkit seemed fine

2017-04-20 00:55
nvm, it's happening in other browsers, too

greg
2017-04-20 02:21
yes - we need a container rebuild for that. I think.

greg
2017-04-20 02:21
You can change ux to ux-dev and it will work around it, I believe.

2017-04-20 02:26
Yeah, I tried ux-dev and was still running into the same issue

2017-04-20 02:27
but also, question, does sledgehammer/the discovery image wipe out disks?

2017-04-20 02:27
or partition tables at least

greg
2017-04-20 02:35
yes as part of the OS provisioning step

2017-04-20 14:28
welp

2017-04-20 14:28
that makes it harder to introduce in existing environments

2017-04-20 14:29
(we used to just wipe partition tables during preseed/kickstart)

greg
2017-04-20 14:44
Only as part of a os install request

greg
2017-04-20 14:45
It is part of a role that runs in sledgehammer Sledgehammer itself doesn't

2017-04-20 14:49
hm, this is weird then

greg
2017-04-20 14:51
what are you seeing?

greg
2017-04-20 14:54
It is part of the provisioner-os-install role - see core/script/roles/provisioner-os-install/01-install-os.sh

2017-04-20 14:55
```[root@d00-25-90-59-d5-62 ~]# parted -s /dev/sda print Error: /dev/sda: unrecognised disk label Model: LSI Logical Volume (scsi) Disk /dev/sda: 2396GB Sector size (logical/physical): 512B/512B Partition Table: unknown Disk Flags: ``` two servers that I guess their owner rebooted for whatever reason look like this after booting up the discovery image - but I do see other servers that booted up into discovery that still have their partition tables intact

greg
2017-04-20 14:57
Does it have a LSI raid controller? Do you have any raid configured? Did you drive that system to any state other than just being discovered?

2017-04-20 14:57
yes, it does have an LSI raid controller - and no, I just let it be discovered

greg
2017-04-20 14:57
so - it should have touched anything.

greg
2017-04-20 14:58
should not

2017-04-20 14:58
I'm going to try to see if I can reproduce it somehow

greg
2017-04-20 14:59
oh - does the system have multiple disks? Could they be enumerating differently?

2017-04-20 15:00
the ones that are affected only have one disk configured on the RAID controller

wdennis
2017-04-20 16:03
@greg - why does DRP control the Ubuntu apt sources instead of using std upstream Ubuntu ones?


2017-04-20 16:06
that's the repo extracted from ubuntu ISOs isn't it

greg
2017-04-20 16:07
yes

greg
2017-04-20 16:07
It tries to keep traffic local to the ISO if possible.

wdennis
2017-04-20 16:26
What about after install? Shouldn't there be a switch to std Ubuntu repos?

wdennis
2017-04-20 16:27
The ISO one doesn't have security updates, or universe

greg
2017-04-20 16:27
Interesting. I thought we left them in post install. It is easy enough to change.

wdennis
2017-04-20 16:28
Found this out when I went to install some sw tools I usually put in right after install (ethtool, tree) and they weren't available

greg
2017-04-20 16:30
yeah - okay - so you probably want to delete lines 55-61 in the net-post-install.sh.tmpl

greg
2017-04-20 16:30
That will make the default sources.list file not get overwritten.

wdennis
2017-04-20 16:30
OK

greg
2017-04-20 16:31
Another alternative that might work is to change the /etc/apt/sources.list reference to /etc/apt/sources.d/drp.repo

wdennis
2017-04-20 16:31
Maybe that should be standard, or at least selectable via a property

greg
2017-04-20 16:32
Yes - issue please. Interestingly enough - since you got us off our rears to add template in template support. I'll be reworking these quite a bit to add those actions.

wdennis
2017-04-20 16:32
Yes, agree, that would be better

wdennis
2017-04-20 16:33
Will do

wdennis
2017-04-20 16:36
In other matters, any DR folk planning on going to USENIX LISA'17?

greg
2017-04-20 16:36
maybe - that is sooooooooo far away. :slightly_smiling_face:


wdennis
2017-04-20 16:37
It would be a good place to demo your wares or give a talk at

greg
2017-04-20 16:38
good to know. Rob may have gone to the 16

wdennis
2017-04-20 16:39
I may (or may not) be submitting a talk (on Logstash improvements our lab has developed)

greg
2017-04-20 16:41
cool

2017-04-20 16:54
hm, I couldn't reproduce the wiped partition table issue from what I thought might have been a catalyst

greg
2017-04-20 16:55
ok - cool - i guess.

greg
2017-04-20 16:56
- PSA - I've updated the k8s workload to the latest kargo tree. It now defaults to v1.6.1

2017-04-20 17:25
@zehicle is this something I can do about now? edit/redeploy/reboot buttons still don't seem to be functional

greg
2017-04-20 17:27
@lae - switch to cli.

greg
2017-04-20 17:27
to make sure it works.

greg
2017-04-20 17:27
on the admin node, I usually do this:

greg
2017-04-20 17:27
docker cp compose_rebar_api_1:/usr/local/bin/rebar /usr/local/bin/rebar

greg
2017-04-20 17:27
chmod +x /usr/local/bin/rebar

greg
2017-04-20 17:27
export REBAR_KEY=u:p

greg
2017-04-20 17:28
rebar nodes list

greg
2017-04-20 17:28
rebar nodes redeploy <id of node in question>

greg
2017-04-20 17:29
Are the errors in the console of the browser? or in the rebar_api container log?

greg
2017-04-20 17:29
Get some of that would be good. All of that in an issue would be better. :slightly_smiling_face:

2017-04-20 17:30
console of the browser, I wasn't seeing anything while watching `docker-compose logs`

greg
2017-04-20 17:30
okay - still the raw error?

2017-04-20 17:30
raw error occurs for edit, but there are different errors for redeploy/reboot

greg
2017-04-20 17:31
awesome

2017-04-20 17:32
so I'm not too familiar with the cli - how do I change a node to change to `local` for redeployment?

greg
2017-04-20 17:32
```rebar nodes redeploy <id>```

2017-04-20 17:33
right, I did that - but does that change it to local? I still see bootenv: sledgehammer in the output

greg
2017-04-20 17:33
Will set the bootenv to discovery/sledgehammer, update the node roles run counts to 0

greg
2017-04-20 17:33
and reboot the node if possible.

greg
2017-04-20 17:34
You should be able to reboot the node now and it should go into sledgehammer.

greg
2017-04-20 17:35
redeploy in the DR sense means to redo discovery and reapply the current config

2017-04-20 17:35
ah ok

greg
2017-04-20 17:35
You may need to manually reboot the node if IPMI is NOT configured and the node is not SSH able.

2017-04-20 17:36
in the UI there's a dropdown when clicking the redeploy icon so I assumed something different

2017-04-20 17:36
ipmi's usable, I have a small tool for that

greg
2017-04-20 17:36
ah - the dropdown lets you reset the OS you want to provision.

2017-04-20 22:12
``` rebar nodes propose 2 rebar nodes update 2 '{"bootenv": "ubuntu-16.04-install"}' rebar nodes commit 2 rebar nodes power 2 reboot ``` is this the proper workflow or am I missing a step?

2017-04-20 22:18
hm, so if I run `nodes update` with either `local` or `sledgehammer` bootenv it'll change the pxelinux config file fine, but `ubuntu-16.04-install` and `centos-7.3.1611-install` don't change it at all

2017-04-20 22:37
Hey folks, if I run "join_rebar.sh" I authenticate OK, but get a * Connection #0 to host 192.168.1.63 left intact curl: (22) The requested URL returned error: 502 Bad Gateway We could not create a node for ourself!

greg
2017-04-20 22:45
@Iae - do ```rebar provisioner machines list``` and see if there are errors there.

greg
2017-04-20 22:46
@newgoliath - that script is old and fragile. It check to make sure it is NOT using 3000 as its port to connect to the admin node.

2017-04-20 22:46
It auths to rebar OK.

2017-04-20 22:47
the ansible installer isn't done yet. Maybe I'm being to eager. :-)

greg
2017-04-20 22:50
oh - patience

2017-04-21 00:41
``` root@labs-rebar:~/digitalrebar/deploy# rebar provisioner machines list 2017/04/21 00:41:31 Error listing provisioner machines: Expected status in the 200 range, got 404 Not Found ```

2017-04-21 00:42
hm :|

2017-04-21 00:43
``` provisioner_1 | [GIN] 2017/04/21 - 00:39:56 | 200 | 548.631µs | 172.17.0.11 | GET /bootenvs/ubuntu-16.04-install provisioner_1 | [GIN] 2017/04/21 - 00:40:27 | 200 | 530.185µs | 172.17.0.11 | GET /bootenvs/local provisioner_1 | provisioner-mgmt2017/04/21 00:40:27.680734 backend: Updating bd71b2b6-6687-4b33-9861-1148c6fc2a63 1 provisioner_1 | provisioner-mgmt2017/04/21 00:40:27.680743 backend: Updating new bd71b2b6-6687-4b33-9861-1148c6fc2a63 1 provisioner_1 | machines:bd71b2b6-6687-4b33-9861-1148c6fc2a63 is a ChangeHooker provisioner_1 | [GIN] 2017/04/21 - 00:40:27 | 202 | 138.131765ms | 172.17.0.11 | POST /machines ``` I see this behaviour in provisioner logs

2017-04-24 16:56
are profiles not usable in rebar, yet?

2017-04-24 16:56
(I don't see any results when I search for "profile" in the docs, and the parameters I'm setting in profiles aren't being inserted in the pxelinux configuration files)

greg
2017-04-24 17:04
Two different profiles.

greg
2017-04-24 17:04
Profiles described in digitalrebar provision are not in digitalrebar.

greg
2017-04-24 17:04
There are profiles in digitalrebar, but they are for node/deployment attribute overrides

greg
2017-04-24 17:05
They are not available to the provisioner.

greg
2017-04-24 17:05
This is a disconnect that will be remedied in the coming months. We are inbetween.

greg
2017-04-24 17:06
@Iae - Do you need the orchestration/hardware pieces (IPMI, BIOS, RAID) for what you are doing? If not, you may want to try digitalrebar provision. It may be more immediately applicable.

2017-04-24 17:15
They're not necessarily needed since I already have tools/processes for that. So I was attempting to use dr-provision earlier this morning, but it seems I'm having trouble possibly having it integrate with our current Infoblox setup?

2017-04-24 17:16
I could configure a subnet in dr-provision but it'd then allocate IPs itself, but if I leave subnets out, machines don't seem to boot: ``` PXELINUX 6.03 lwIP 2014-10-06 Copyright (C) 1994-2014 H. Peter Anvin et al Unable to locate configuration file ```

2017-04-24 17:16
``` dr-provision2017/04/24 17:12:26.849155 Recieved DHCP packet: type Discover xid 0x6b6d145f ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:16:3e:07:33:21 dr-provision2017/04/24 17:12:27.829885 Recieved DHCP packet: type Discover xid 0x6b6d145f ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:16:3e:07:33:21 dr-provision2017/04/24 17:12:29.807194 Recieved DHCP packet: type Request xid 0x6b6d145f ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:16:3e:07:33:21 dr-provision2017/04/24 17:12:29.807382 xid 0x6b6d145f: Ignoring request for DHCP server 10.13.2.24 dr-provision2017/04/24 17:12:36.196744 sending block 0: code=0, error: TFTP Aborted ```

2017-04-24 17:18
also I guess I should probably repull dr-provision since I see that PR on profiles was just merged

greg
2017-04-24 17:25
well - you can turn off dhcp in dr-provision.

greg
2017-04-24 17:25
okay - a couple of things.

greg
2017-04-24 17:26
I've updated tip with all the stuff fomr this morning - should be out there in about 10minutes. You will need to get the tarball from the tip or use --rs-version=tip in the curl bash install command.

greg
2017-04-24 17:27
Second, you can use DR-provision with DHCP off and let infoblox drive IP assignment. You will need to set nextboot server to dr-provision and set a bootfile to lpxelinux.0 (assuming legacy intel bios machines). If you are going to do discovery, you will need to also set option 15 (domainname to something).

greg
2017-04-24 17:29

greg
2017-04-24 17:29
I need to merge configure server and deployment. They are similar but different.

2017-04-24 17:32
my Infoblox as far as I know is configured correctly for both legacy/uefi systems at the moment (well, the bootfiles are prefixed with discovery/ still, but I created a symlink in drp-data/tftpboot)

greg
2017-04-24 17:33
won't work. The filesystem isn't live like that. The non-static files are generated on the fly when requested. This allows for parameters from profiles and other things to be inject as needed and things aren't out of date..

greg
2017-04-24 17:34
I have some docs in about that, but it is still coalescing.

2017-04-24 17:35
is that a difference between DR and dr-provision? PXE boot was working correctly on DR with this configuration - I can have it modified but since I don't admin the infoblox instance, I've gotta wait until a change gets implemented before I can test things again

greg
2017-04-24 17:35
yes

greg
2017-04-24 17:36
We are attempting to handle more dynamic and richer bootenvs/templates with dr-provision.

2017-04-24 17:38
Alright, just to confirm - based on what you said this config should be fine, right? ``` subnet 10.11.110.0 netmask 255.255.255.128 { option domain-name "eng.fireeye.com"; option domain-name-servers 10.11.10.12, 10.11.10.140; option routers 10.11.110.1; server-name "10.11.110.50"; next-server 10.11.110.50; filename "lpxelinux.0"; if (substring(option vendor-class-identifier,0,9)="PXEClient:Arch:00006") { # Option filter "UEFI-BOOTIA32.efi" server-name "10.11.110.50"; next-server 10.11.110.50; filename "bootia32.efi"; } elsif (substring(option vendor-class-identifier,0,9)="PXEClient:Arch:00007") { # Option filter "UEFI-BOOTX64.efi" server-name "10.11.110.50"; next-server 10.11.110.50; filename "bootx64.efi"; } } ```

greg
2017-04-24 17:42
I think so. The substring options look okay.

2017-04-24 22:04
Alright, sledgehammer booted! :ok_hand:

greg
2017-04-24 22:08
:slightly_smiling_face: Cool

greg
2017-04-24 22:09
I'll be on phone but driving.

greg
2017-04-24 22:09
in 10 minutes.

greg
2017-04-24 22:15
on phone

zehicle
2017-04-25 01:15
@greg wrong channel :slightly_smiling_face:

2017-04-25 12:30
question, is it possible to provision bare-metal from the ground up (configure RAID, install OS, etc.) with DRB and DRB-Provision? I work at Cisco and (obviously) we use Cisco equipment. I have been playing with DRB and DRBP, so far every thing works fine. Now I'm looking for guidance on how to go from metal-to-running_app using DRB.

greg
2017-04-25 13:03
Hi @vjcubas. It probably is. We haven't done much with Cisco gear. We usually have success with discovery and OS install and beyond. There is sometime some issues with IPMI. Raid will be dependent upon what raid controller is in the gear. BIOS configuration and setting is unlikely without some additional coding. It is the most hardware specific thing. It can be added but it takes some time and direct access to a box. (vendor tools help as well).

greg
2017-04-25 13:05
With regard to DRB (DR) vs DRB-Provision (DRP). Currently, DRP is just about OS installation and discovery. It is not currently integrated with DR. So, if you are looking for Orchestration or hw manipulation, then you will want to focus on DR. If you have DR working, then you are doing well.

greg
2017-04-25 13:06
We may need to have a discussion around what you are trying to do, where you are, and end states.

greg
2017-04-25 13:07
If you want, we can talk off-line (email me at )

greg
2017-04-25 13:10
Also, what do you have running? DRP vs DR or both

2017-04-25 17:20
right now I have DRP - and I was wondering if we could run DR and DRP on the same server

greg
2017-04-25 17:20
They will fight currently. with port contentions. well - let me think a moment.

greg
2017-04-25 17:21
okay - you kinda can, but the integration would be pretty light and may not be what you want.

greg
2017-04-25 17:22
Partially, why I asked what your end goal is.

greg
2017-04-25 17:24
You could run DRP with DR on the same system if you turn off the DHCP and Provisioner server in DR. To hook, DRP to DR you would need to create a script to "join" installed nodes into DR after it was done. The machines would show up (you probably couldn't manage lifecycle directly from DR, but it would be something).

greg
2017-04-25 17:24
Acutally, this is interesting quick integration path for the short term. Hmmm .. not what I want medium to long term. Need to think.

greg
2017-04-25 17:25
The intent is DR for more full featured environments (hardware settings, workload orchestration, ...).

greg
2017-04-25 17:25
Back to your end state goals.

2017-04-25 17:42
Getting errors on "logging-client" complaining about system clock "INFO: HTTP Request Returned 401 Unauthorized: Authentication failed. Please check your system's clock". Is there a set to run ntpdate that is missing in deployment?

greg
2017-04-25 17:43
Yes - it should run ntpdate as part of sledgehammer.

greg
2017-04-25 17:43
As part of common.env, you can set an upstream ntp server. by default it will make the dr master node the time server.

greg
2017-04-25 17:44
You can check the sledgehammer log by logging into the node in question: root/rebar1

greg
2017-04-25 17:44
Then: journalctl -u sledgehammer

greg
2017-04-25 17:44
Look for ntp

greg
2017-04-25 17:44
it may give a hint at what is going on.

2017-04-25 17:46
Added 4 hosts to deployment. Only 1 worked. The date on the working system, the time is 10:30 UTC. The date on the provisioner is 3:30 PDT. The command journalctl -u sledgehammer returns "-- No Entries --"

2017-04-25 17:49
The contents of /var/chef/cache/chef-stacktrace.out starts with Net::HTTPServerException: 401 "Unauthorized"

greg
2017-04-25 17:50
I've seen that sometime the ntp server on the node can get out of sync with it self or take a long time to stablize. I'm not sure what is causing this. Sometime restarting the ntp container. will "fix" it.

greg
2017-04-25 17:50
Is this post OS install or during discovery?

2017-04-25 17:51
How do I get in docker to set the date? It is post OS install. I cannot run ntpdate on the provisioner and get an error of "25 Apr 03:49:47 ntpdate[19206]: the NTP socket is in use, exiting"

greg
2017-04-25 17:52
```ntpq -p``` to check the state of ntp on the admin node

greg
2017-04-25 17:52
The docker container should be using the host's clock

2017-04-25 17:53
ntpq -p returns "localhost: timed out, nothing received"

greg
2017-04-25 17:53
okay - docker ps | grep ntp

2017-04-25 17:54
c492475e0902 digitalrebar/dr_ntp:master "/sbin/docker-entr..." 4 days ago Up 4 days 0.0.0.0:123->123/tcp, 0.0.0.0:123->123/udp compose_ntp_1

greg
2017-04-25 17:55
What install mode did you use for DR?

2017-04-25 17:56
host

greg
2017-04-25 17:56
okay - cool - hmmm

greg
2017-04-25 17:56
ntpstat

greg
2017-04-25 17:56
Post Install on all 3 of the failing nodes.

greg
2017-04-25 17:57
That service is different. I think it is rebar

2017-04-25 17:57
Yes, but one post install succeeded

greg
2017-04-25 17:57
Yes - if clock was close it would be fine.

greg
2017-04-25 17:57
chef is very very time sensitive in its cert management.

greg
2017-04-25 17:58
So - my guess is that the 3 nodes didn't time sync for some reason and didn't join correctly. The logging-client role is the first chef role encountered.

greg
2017-04-25 17:58
What os?

2017-04-25 17:59
rhel 7, do I need to start ntpd on provisioner or just run ntpdate to set the time correctly for docker ntp?

greg
2017-04-25 17:59
is the provisioner time's off?

greg
2017-04-25 17:59
ntpdate -q on the admin node hsould adjust it on the docker container. The problem is it might wig out ntp for a bit.

greg
2017-04-25 18:01
on centos7/rhel7 systems, you should have a rebar service

greg
2017-04-25 18:01
systemctl status rebar

greg
2017-04-25 18:01
journctl -u rebar

2017-04-25 18:02
systemctl status rebar Unit rebar.service could not be found.

greg
2017-04-25 18:03
hmmm - did you use the centos7 ks script? it should have created:

greg
2017-04-25 18:04
/usr/sbin/rebar_join

2017-04-25 18:04
We worked last friday on the phone and setup rhel-7.3-server

greg
2017-04-25 18:04
oh - Darren.

2017-04-25 18:04
Yes

2017-04-25 18:05
/usr/sbin/rebar_join -bash: /usr/sbin/rebar_join: No such file or directory

greg
2017-04-25 18:07
okay - checking files.

2017-04-25 18:09
rebar_join on box being install returns ... /usr/sbin/rebar_join@48(): [[ -x /bin/rebar ]] /usr/sbin/rebar_join@53(): export REBAR_ENDPOINT=https://192.168.128.10 /usr/sbin/rebar_join@53(): REBAR_ENDPOINT=https://192.168.128.10 /usr/sbin/rebar_join@65(): ntpdate 192.168.128.10 /usr/sbin/rebar_join@67(): case $1 in /usr/sbin/rebar_join@71(): echo 'Unknown action to rebar_join.sh.' Unknown action to rebar_join.sh. /usr/sbin/rebar_join@72(): exit

greg
2017-04-25 18:09
rebar_join start

greg
2017-04-25 18:11
in fact, that run successfully on machine startup because you got the logging_client error.

greg
2017-04-25 18:13
You could try ntpdate -q and the retry the node role.

2017-04-25 18:19
The rebar_join start worked - now they are all finishing the install correctly. Thanks

2017-04-25 18:20
Still had problem with bios-discover step - had to "retry" on every system and it then passed that step. The ntp problem is new

greg
2017-04-25 18:24
hmm - seems like we still have some networking communication issue. Things are consistently talking it seems.

2017-04-25 18:30
Is this because we are running "tools/docker-admin-up --access HOST --no-pull" instead of "tools/docker-admin-up --access HOST"?

2017-04-25 18:32
I remember now, the --no-pull option just was used to not download images from the Internet, correct?

greg
2017-04-25 19:45
yes

greg
2017-04-25 19:45
@intendo - yes

2017-04-25 20:53
I restarted digital rebar with:

2017-04-25 20:53
cd ~/digitalrebar/core; tools/docker-admin-down ; sleep 10 ; EXTERNAL_IP=192.168.128.10/24 tools/docker-admin-up --access HOST --no-pull

2017-04-25 20:54
None of my nodes, templates, etc. are showing up from the rebar api or GUI. What did I forget to do?

greg
2017-04-25 21:10
ohh - that is destructive install. You need to reboot the nodes for them to be rediscovered and await in sledgehammer for an OS to install.

greg
2017-04-25 21:10
If you are just wanting to restart containers, ```docker-compose restart``` from the deploy/compose directory will work.

zehicle
2017-04-25 21:42
Gitter/IRC users > we can get you invites directly into this slack channel is you prefer Slack. They are all synchronized, so there's no need to switch if you are happy where you are

2017-04-25 22:06
I have resisted the slack, but... I am in so many slack channels now might as well go slack native.

2017-04-25 22:07
If you can shoot invites to mike, ben, jordan @supergiant.io

2017-04-25 22:10
@zehicle - how do I get my templates back into digital rebar? Do I have to rerun all the commands ?: rebar provisioner files upload chef-12.18.31-1.el6.x86_64.rpm to chef/chef-12.18.31-1.el6.x86_64.rpm rebar provisioner templates upload rhel-7.3-server.ks.tmpl as rhel-7.3-server.ks.tmpl rebar provisioner bootenvs create - < rhel-7.3-server.json

greg
2017-04-25 22:11
The chef should still be there in the cache directory if you used the same user. The other two commands are correct and this is @greg. :slightly_smiling_face:

2017-04-25 22:31
@galthaus, Just FYI, everything you type shows up as "Rob Hirshfeld @zehicle\n [Greg Althaus, RackN]" and then your message.

2017-04-25 22:32
Yeah - it is the side effect of the slack to gitter integration we are using.

2017-04-25 22:32
I use the slack side - all my groups are in there. It is easier for me to watch that one place.

2017-04-25 22:32
Thanks though for pointing it out. :-)

greg
2017-04-25 22:33
@galthaus will pop a notification in gitter for me. @greg will pop up the notification in slack. Now you all can hound me everywhere. :slightly_smiling_face:

2017-04-25 23:21
@greg where is the official slack, or can I get some invites sent?

greg
2017-04-25 23:21
working on it. Have to remember. :slightly_smiling_face:

greg
2017-04-25 23:26
@jordan - done, I think.

jordan
2017-04-25 23:28
has joined #json

jordan
2017-04-25 23:28
Thanks

greg
2017-04-25 23:28
:slightly_smiling_face:

jordan
2017-04-25 23:30
Mike is going to start working on http://packet.net integration for SuperGiant tomorrow, and likely at the same time by extension digital rebar, so you will probably see him around these parts for a while as we work through that.

jordan
2017-04-25 23:31
You know anyone else doing anything like packet? Other than softlayer - sort of - I don't know anywhere else one can pop raw hardware?

greg
2017-04-25 23:32
yeah - not sure either. We were playing with Nobis/Ubiquity at one point, but I think they got acquired. Not sure the state of their universe.

jordan
2017-04-25 23:36
Have you worked with softlayers bare metal offerings at all?

greg
2017-04-25 23:37
nope - haven't looked. Was/Is on a One Day provider thing, but not recently

jordan
2017-04-25 23:37
or well, bluemix technically I guess

2017-04-26 00:00
@galthaus, I have rerun the provisioner and I am trying to create a workload -> Install O/S but I can't see a list of the systems. How do I put the systems back in the pool? I am still logged into two of the systems. I turned off 4 systems and then turned them back on but nothing is showing up in the ux.

greg
2017-04-26 00:01
Nodes once discovered should show up in the system deployment the first time and under the nodes nav tree item.

greg
2017-04-26 00:01
If the nodes don't show up there, then they haven't been discovered (or discovered successfully).

greg
2017-04-26 00:02
Are your nodes configured to PXE boot by default?

2017-04-26 00:09
@galthaus How do you configure them to PXE boot by default? Is that set in the BIOS? The first time I turned them on, they were visible by the UX. After the install, I did the "destructive teardown" and now none of the systems are visible. Did the install change the PXE boot to default to local disk?

greg
2017-04-26 00:21
Well - if I recall, you told me that the nodes were wiped. If the disks were wiped, they'd fall back to PXE most likely.

greg
2017-04-26 00:21
You can try the IPMI tool to force a pxe boot.

greg
2017-04-26 00:22
You may have one of the command history with the power status

greg
2017-04-26 00:24
ipmitool -U <username> -P <password> -H <ip address> chassis bootdev pxe

greg
2017-04-26 00:26
In theory, this will make it consistently pxe boot.

greg
2017-04-26 00:26
```ipmitool -U <username> -P <password> -H <ip address> chassis bootdev pxe options=persistent```

2017-04-26 00:32
@galthaus how do I get the IP addresses?

2017-04-26 00:33
I can ssh to the bmc IP address but I get the SMASH console

greg
2017-04-26 00:34
Well - will need to find them, probably since the nodes are deleted. You have the bmc network. The addresses were assigned from there. They start with the start address. The networks nav in the UI should get you to the bmc network definition to see the starting address.

2017-04-26 00:57
@galthaus here is what I did to get _most_ of them set to bootdev pxe: # for ip in 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215; do echo $ip; ipmitool -H 192.168.128.$ip -U root -P rebar1 chassis bootdev pxe options=persistent; done 200 Error: Unable to establish LAN session 201 Set Boot Device to pxe 202 Set Boot Device to pxe 203 Set Boot Device to pxe 204 Set Boot Device to pxe 205 Set Boot Device to pxe 206 Set Boot Device to pxe 207 Set Boot Device to pxe 208 Set Boot Device to pxe 209 Set Boot Device to pxe 210 Set Boot Device to pxe 211 Set Boot Device to pxe 212 Set Boot Device to pxe 213 Error: Unable to establish LAN session 214 Error: Unable to establish LAN session 215 Error: Unable to establish LAN session

greg
2017-04-26 00:59
Cool - now reboot them. You can use the ipmitool chassis power cycle

greg
2017-04-26 00:59
like above but different. :slightly_smiling_face:

2017-04-26 01:30
@galthaus rebooted them but no new nodes showing up in list

greg
2017-04-26 01:31
do you see DHCP leases for the nodes in the DHCP nav section.

2017-04-26 01:34
@galthaus yes, but only 4 nodes are binding

greg
2017-04-26 01:36
hmmm - so nothing in the nodes list. hmm

2017-04-26 01:37
@galthaus there are 4 nodes in the nodes list but there should be 16

greg
2017-04-26 01:37
ah - okay.

2017-04-26 01:37
or at least 12

2017-04-26 01:37
@galthaus - would this be easier to skype?

greg
2017-04-26 01:39
probably - can we try tomorrow morning?

2017-04-26 01:41
@galthaus yes, I will try to call you but I have meetings up the wazzu :-)

greg
2017-04-26 01:41
okay - sorry I'm fading is the problem and I need to finish some doc work I'm doing.

greg
2017-04-26 05:15
A new release of Digital Rebar Provision is out, v3.0.1 and stable has been updated. https://github.com/digitalrebar/provision/releases/tag/v3.0.1

2017-04-26 15:01
@galthaus Ready whenever you are to Skype.

greg
2017-04-26 15:01
there

2017-04-26 18:16
You guys ever test DR on Centos 6.9?

greg
2017-04-26 18:47
not specifically.

2017-04-26 21:08
can I get a slack invite? lae@lae.is

greg
2017-04-26 21:21
Done

lae
2017-04-26 21:41
has joined #json

2017-04-27 14:22
I've been looking at the DRP docs - trying to add custom ISO image - I'm not clear on the process, does anyone here have suggestions on this topic. I apologize if this is not the proper channel for posting DRP questions - Thanks in advance:)

greg
2017-04-27 14:39
I'm working on docs now. Check operations section about bootenvs. What are trying to do?

gopherstein
2017-04-27 14:57
has joined #json

2017-04-27 15:01
we have a custom CentOS-7 iso that we'd like to deploy to the targeted hosts - also, once this image has been installed on those hosts, I'd like to be able to reboot them without re-imaging them through the PXE boot - I know changing the boot order in the bios would accomplish this, but perhaps there is a better/easier way to do so - thanks

lae
2017-04-27 15:02
you'd set bootenv to local as part of %post in your centos kickstart file

greg
2017-04-27 15:28
Thanks, @lae ! Exactly. I'm in the process of documing and templatizing this so that you can include it in any custom kickstart. something like this:

greg
2017-04-27 15:29
``` {{ template "update-drp-local" . }} ```

2017-04-27 18:14
thanks, another question: is there a way to install ssh key during the deployment, or to set user/password - BTW, what is the default user/pass for the centos-7-3.1611 image after deployment?

greg
2017-04-27 19:25
Yes and yes.

greg
2017-04-27 19:25
root/RocketSkates



greg
2017-04-27 19:28
There are some more docs and examples coming describing just those issues.

2017-04-28 19:58
@galthaus Our reinstall (ll41) is "stuck" in the provisioner-os-install mode where it keeps repeating the output: 2017/04/28 19:44:15 Could not connect to Rebar: Head https://127.0.0.1:3000/api/v2/digest: dial tcp 127.0.0.1:3000: getsockopt: connection refused //tmp/scriptjig-Dkrnn4/provisioner-os-install/01-install-os.sh@51(): sleep 1 //tmp/scriptjig-Dkrnn4/provisioner-os-install/01-install-os.sh@50(): rebar nodes get d71485f0-5c26-4438-8b9c-9648c8b2b4b5 attrib provisioner-active-bootstate //tmp/scriptjig-Dkrnn4/provisioner-os-install/01-install-os.sh@50(): grep -q -- -install

greg
2017-04-28 20:01
rebar nodes update d71485f0-5c26-4438-8b9c-9648c8b2b4b5 '{ "bootenv": "local" }'

greg
2017-04-28 20:01
rebar nodes update d71485f0-5c26-4438-8b9c-9648c8b2b4b5 '{ "bootenv": "rhel-7.3-server-install" }'

greg
2017-04-28 20:02
See if that clears it.

2017-04-28 20:04
@galthaus Nope, gets stuck in same loop

greg
2017-04-28 20:05
hmm - bigger hammer

greg
2017-04-28 20:05
rebar nodes redeploy d71485f0-5c26-4438-8b9c-9648c8b2b4b5

2017-04-28 20:09
@galthaus Nope, is is still stuck in that loop. Got a biggerer hammer?

greg
2017-04-28 20:09
The redeploy should have rebooted the node. If it didn't then try and reboot the node.

greg
2017-04-28 20:10
rebar nodes power d71485f0-5c26-4438-8b9c-9648c8b2b4b5 reboot

2017-04-28 20:11
@galthaus That forced the reboot

greg
2017-04-28 20:13
what is the node's bootenv?

2017-04-28 20:17
sledgehammer

greg
2017-04-28 20:17
okay that is good. Let's see what happens. It should redo the whole process.

2017-04-28 20:19
I will let you know if it works. Going to put all the missing drives into the hardware to add another 24 nodes.

greg
2017-04-28 20:20
cool

zehicle
2017-04-28 20:30
note: edits do NOT make it into the gitter channel...

greg
2017-04-28 20:31
yes - I know. Teach them to join slack. :slightly_smiling_face:

2017-04-28 21:22
Probably a silly question, but I'm brand new to rebar and trying to setup a bare metal environment. I've run the "run-in-system" script with provisioner and dhcp containers and then I found out that I need to get the RAID and BIOS tools :-P So I did that but I'm not totally sure how to re-run just raid-tools-install role.

greg
2017-04-28 21:25
There is a retry button in the ux. The annealer view top right spiral looking icon.

greg
2017-04-28 21:25
Select that.

greg
2017-04-28 21:25
The errors will be at the top. On the right, there should be a retry all button.

2017-04-28 21:26
hmmm.. no errors listed.

2017-04-28 21:33
here is what I have done so far: - pulled digital rebar - ran sudo ./run-in-system.sh --deploy-admin=local --access=host --con-provisioner --con-dhcp --admin-ip=<ip/subnet> - downloaded and copied RAID and bios tools into ~/.cache/.../files/raid

2017-04-28 21:33
but when I login to the UX, I don't see the "Provisioner" tab.

2017-04-28 22:01
Hi guys, with the quickstart deploy, I try to add drpcli bootenvs install bootenvs/centos-7... and it fails to explode. Where can I look for errors? The error message in the logs just ends with /sbin/selinunxenabled

2017-04-28 22:01
error code 255

2017-04-28 22:02
the bootenv should have some error messages

2017-04-28 22:02
Nothing actionable, seemingly: Explode ISO: explode_iso.sh failed for centos-7.3.1611-install: exit status 255", "Command output:\nExplode iso centos-7.3.1611 /root/drp-data/tftpboot /root/drp-data/tftpboot/isos/CentOS-7-x86_64-Minimal-1611.iso /root/drp-data/tftpboot/centos-7.3.1611/install\nExtracting /root/drp-d ata/tftpboot/isos/CentOS-7-x86_64-Minimal-1611.iso for centos-7.3.1611\n/sbin/selinuxenabled\n

2017-04-28 22:02
the install combines several actions, you may need to try them one at a time. Also, I think there's a --debug flag for the CLI

2017-04-28 22:02
Not OK to run as root?

2017-04-28 22:03
DRP must run w/ root privs

2017-04-28 22:04
but the CLI does not requiire it

2017-04-28 22:04
disk space?

2017-04-28 22:04
Plenty free.

2017-04-28 22:06
also, drpcli bootenvs list doesn't show the ID of a bootenv, so you can't know the ID to use with drpcli bootenvs destroy <ID>

2017-04-28 22:06
Unless it's "name'

2017-04-28 22:07
I'm not much help on this score, soryr

2017-04-28 22:08
Ok, so name == ID

2017-04-28 22:09
bootenvs destroy and create again put it in available: true

2017-04-28 22:09
Timing issue, mayhaps.

2017-04-28 22:22
Now I'm getting malformed basic auth strings:

2017-04-28 22:22
dr-provision2017/04/28 22:21:15.241830 Malformed basic auth string: dG9vbHMvZGlzY292ZXJ5LWxvYWQuc2g= [GIN] 2017/04/28 - 18:21:15 | 401 | 55.948µs | 192.168.1.1 | GET /api/v3/bootenvs dr-provision2017/04/28 22:21:20.085661 Bad auth header: Basic [GIN] 2017/04/28 - 18:21:20 | 401 | 42.698µs | 192.168.1.1 | GET /api/v3/bootenvs

2017-04-28 22:22
Can't login anymore.

greg
2017-04-29 02:05
Huh?

greg
2017-04-29 02:06
@newgoliath - selinux could keep explode iso from doing its explode work in the local directory or /var/lib/tftpboot

greg
2017-04-29 02:06
I haven't tried on a system selinux enabled.

greg
2017-04-29 02:07
With regard to auth, the RS_KEY should be username:password or -U and -P should be used.

greg
2017-04-29 02:07
Make sure RS_TOKEN is not set.

greg
2017-04-29 02:08
@spencerwjensen - missing the provisioner tab means that the containers didn't come up correctly.

greg
2017-04-29 02:09
cd digitalrebar/deploy/compose

greg
2017-04-29 02:09
docker-compose ps

greg
2017-04-29 02:09
wait - start over - host should be HOST. You may get a in wonky mode that way.

greg
2017-04-29 02:10
Rerun the run-in-system command.

2017-04-29 05:39
Wow! Thanks @galthaus! I will give that a shot!

wdennis
2017-05-01 03:19
@greg - you going to be at DOD Austin this year?

greg
2017-05-01 03:23
Yes - I think the plan is for all of us to be around. Victor, Rob, and I.

wdennis
2017-05-01 03:24
Nice, would be great to meet you all?

greg
2017-05-01 03:24
Yes - put real faces with people.

wdennis
2017-05-01 03:25
I hear there?s no Uber/Lyft down there yet; what?s the local cab service(s)? (Don?t plan on renting a car?)

greg
2017-05-01 03:26
There are replacements, but I'm not sure what they are. Should be able to google.

wdennis
2017-05-01 03:26
OK

greg
2017-05-01 03:28
It is a little funky. There are cabs, but @zehicle may know.

wdennis
2017-05-01 03:57
Did some research - there?s some Austin ride-share services available ? downloaded apps for RideAustin and Fasten

wdennis
2017-05-01 03:57
I need to get out and eat BBQ / maybe see some music when I?m down there? :grinning:

greg
2017-05-01 03:59
Yes -

2017-05-01 16:13
I've done an isolated install, created a subnet (on my eth2, overlapping the subnet already on the NIC) and when the machine rebooted PXE, drp showed logs of it discovering it, but then the dr-provisioner process died and now the UI hates the default password.

2017-05-01 16:13
I restarted the process with the same command - ./dr-provisioner --etc...

2017-05-01 16:13
[sic dr-provision]

2017-05-01 16:14
for the webserver, go back the the root (w/o UI)

2017-05-01 16:15
BTW, got around the explode issue by turning off selinux. :(

2017-05-01 16:16
tried to just call the IP and port, but creds still failed.

2017-05-01 16:17
It forwarded to /ui

2017-05-01 16:17
dr-provision2017/05/01 16:16:46.489545 Bad auth header: Basic

greg
2017-05-01 16:20
okay - dr-provision died. I've been see some timeouts on DHCP writes. I'm going to address that. UI hating the default password. seems strange. for the UI, it is easies to use https://<ip>:8092/ui/?token=rocketskates:r0cketsk8ts

greg
2017-05-01 16:20
at least for me.

2017-05-01 16:21
Thanks, Greg. token= works.

greg
2017-05-01 16:23
For you selinux issue, was that your isolated install or the "production" install?

2017-05-01 16:23
only ever tried ISOLATED

greg
2017-05-01 16:24
okay - wow - so selinux prevent updates to a local directory. thanks. I'll add that info to the issue.

2017-05-01 16:26
My dhcp client host is gone. :(

2017-05-01 16:27
IPMI port not plugged in for that host. :(

greg
2017-05-01 16:27
yikes - did we make it worse?

2017-05-01 16:28
failing the DHCP -> TFTP handoff ... maybe the host will timeout somehow.

2017-05-01 16:28
Can tell if "nexthost" ever made it to the client.

greg
2017-05-01 16:29
How many machines? I just found something on that myself. Trying to isolate and fix. I have something that makes it "better". Not sure it completely fixes it.

2017-05-01 16:29
Just one.

2017-05-01 16:29
I'm off to that datacenter at the end of the week anyway. More serial cables - belt and suspenders.

greg
2017-05-01 16:30
hmm - didn't see it with just one. More with 4 - though if drp is seeing extra dhcp requests, we'll still read them to ignore them. I have a timeout on writes that seems to make things worse. New code in a bit that removes that write timeout.

2017-05-01 16:31
There are two NICs on another host that are making DHCP client noise.

greg
2017-05-01 16:31
nope - still get it more. Research on my side.

2017-05-01 17:40
@galthaus Greg - do you have time to Skype. I only got 1 success out of 16. Default CentOS copy of kickstart is partitioning badly and disk is full even though it is a 1 TB drive.

wdennis
2017-05-01 17:51
@greg - On http://provision.readthedocs.io/en/latest/doc/arch/data.html#template I don?t see entries for: .Env.OS.Family .Env.OS.Version - since they are used in the templates (at least for U16.04 preseed and post-install), perhaps they should be documented?

greg
2017-05-01 18:06
@intendo - not at the current moment. You are using DR. From Sledgehammer, you can see the drive order and what the drives are. Is there an extra USB drive or something messing with enumeration.

greg
2017-05-01 18:07
@wdennis - okay - Yes - they should be. The ubuntu/debian templates need them. Kinda. They are part of the bootenv values and on settable as parameters on the machine or profiles.

wdennis
2017-05-01 18:08
OK - maybe I?ll talk to you guys about joining the docs team when I see you all in a few days - since I?m ?beginner mind? and trying to learn this, I think I may be able to contribute

wdennis
2017-05-01 18:09
So, another q - what?s the best (easiest) way of updating DRP to latest (stable)?

greg
2017-05-01 18:09
that would be cool. Just need to add it to the pile of things to get around to.

greg
2017-05-01 18:09
Have a PR on that in just a bit.

wdennis
2017-05-01 18:10
Want to try adding a profile for a group of machines to set default params, then apply to machines

wdennis
2017-05-01 18:11
Not sure I?m on a version that can do that ? [dradmin@dr-admin ~]$ ./drpcli version Version: v2.9.1003-tip-73-8918047a73649afac5db926a254a8af88a8cefe5

greg
2017-05-01 18:11
drpcli profiles

wdennis
2017-05-01 18:12
``` [dradmin@dr-admin ~]$ ./drpcli profiles Error: unknown command "profiles" for "drpcli" Run 'drpcli --help' for usage. ```

greg
2017-05-01 18:13
don't have it. I was pretty sure, but now we know.

wdennis
2017-05-01 18:22
@greg - so the question is, how to upgrade an existing DRP install?

greg
2017-05-01 18:27
Yes - I know. I have a doc for that. I haven't committed it.

greg
2017-05-01 18:29
trying to render to show you.

wdennis
2017-05-01 18:30
Ah, OK

wdennis
2017-05-01 18:30
Thx

greg
2017-05-01 18:31
```Upgrade While not glamorous, you can install over the existing code and restart. That is about it. Here are few more details. Steps For isolated Install, update this way: Stop dr-provision: killall dr-provision Return to your install directory Run the install again rm sha256sums # Remeber to use --drp-version is you want something other than stable # Curl/Bash from quickstart if you truly believe, or this: tools/install.sh --isolated install Restart dr-provision, as stated by the tools/install.sh output. For non-isolated Install, update this way: Stop dr-provision, using your system method of choice systemctl stop dr-provision or service dr-provision stop Install new code - How ever you installed before, do it again. Install Start up dr-provision systemctl start dr-provision or service dr-provision start Version to Version Notes In this section, notes about migrating from one release to another will be added. v3.0.0 to v3.0.1 If parameters were added to machines or global, these will need to be manually readded to the machine or global profile, respectively. The machine?s parameter setting cli is unchanged. The global parameters will need to be changed to a profiles call. drpcli parameters set fred greg to drpcli profiles set global fred greg v3.0.1 to v3.0.2 Nothing known to be required to be done.```

greg
2017-05-01 18:31
something

wdennis
2017-05-01 18:48
OK, looks good? ``` [dradmin@dr-admin ~]$ ./drpcli version Version: v3.0.1-0-730b0a596e1b6fa2103f52e3d19fb9c3f9b2a9af [dradmin@dr-admin ~]$ ./dr-provision --version dr-provision2017/05/01 14:33:39.707080 Version: v3.0.1-0-730b0a596e1b6fa2103f52e3d19fb9c3f9b2a9af ```

wdennis
2017-05-01 18:51
:cry: Still no ?profiles? subcommand to drpcli?

wdennis
2017-05-01 18:53
``` Available Commands: autocomplete Rocket-Skates CLI Command Bash AutoCompletion File bootenvs Access CLI commands relating to bootenvs files Commands to manage files on the provisioner help Help about any command interfaces Access CLI commands relating to interfaces isos Commands to manage isos on the provisioner leases Access CLI commands relating to leases machines Access CLI commands relating to machines prefs List and set DigitalRebar Provision operational preferences reservations Access CLI commands relating to reservations subnets Access CLI commands relating to subnets templates Access CLI commands relating to templates users Access CLI commands relating to users version Rocket-Skates CLI Command Version ```

greg
2017-05-01 19:02
hmm - okay trying to look

greg
2017-05-01 19:03
of course not - it is still in tip. Need to cut a release to include it. Sigh.

greg
2017-05-01 19:03
Was hoping to have some additional updates with that.

wdennis
2017-05-01 19:44
No worries- will work with what I have atm

wdennis
2017-05-01 19:45
I do have a problem tho :confused:

wdennis
2017-05-01 19:46
Booted a node to SH, came up in the UX, set the bootenv to U16.04 install; rebooted on PXE and booted back into SH again...

wdennis
2017-05-01 19:47
What's the l/p for SH again? Not "rebar / rebar1" or "rocketskates / RocketSkates"

greg
2017-05-01 19:48
root / rebar1

wdennis
2017-05-01 19:49
I still have the "ubuntu-16.04-install" bootenv showing in UX, think it got messed up by update??

greg
2017-05-01 19:50
Update won't update the templates or bootenvs.

wdennis
2017-05-01 19:51
Ok, logged in to the node which is up on SH, it does have the expected IP addr...

wdennis
2017-05-01 19:52
The UX doesn't show the nodes MAC addr, that's what it actually keys off of right?

greg
2017-05-01 19:52
no - ip

wdennis
2017-05-01 19:52
Really

greg
2017-05-01 19:53
dhcp uses mac to map to ip. templates are rendered by ip usually.

wdennis
2017-05-01 19:53
Thought it would be MAC...

wdennis
2017-05-01 19:53
Yes, OK

greg
2017-05-01 19:53
because of the pxeliunx chain.

wdennis
2017-05-01 19:54
So you are depending on DHCP to hand out dame IP between reboots (which should be the case...)

greg
2017-05-01 19:54
yes .


wdennis
2017-05-01 19:57
So you can see the BootEnv is set to the U16.04 install...

wdennis
2017-05-01 19:58
Rebooting the node again, will see what happens...

greg
2017-05-01 19:59
ok

greg
2017-05-01 19:59
hmm - not sure.

wdennis
2017-05-01 20:02
Nope, got SH again...

greg
2017-05-01 20:04
In the UI, check the machines listing and see if two nodes have that IP.


greg
2017-05-01 20:05
it got a different IP.

wdennis
2017-05-01 20:07
Only during that phase I guess -- it has the "correct" one now on my DHCP server...

greg
2017-05-01 20:08
Okay - sooooo - I know what is going on, I think, and we do some special "magic" in our DHCP server to deal with it.


wdennis
2017-05-01 20:09
No leases recorded for 192.168.1.111

greg
2017-05-01 20:09
Are those bindings (reserved) or are they current state?

wdennis
2017-05-01 20:10
Current state

greg
2017-05-01 20:10
DHCP servers are supposed to pay attention to the client identifier field of the request.

greg
2017-05-01 20:11
Our DHCP server only pays attention to the chaddr field - MAC Address.

wdennis
2017-05-01 20:12
My DHCP comes from a pfSense OS box

greg
2017-05-01 20:12
That way pxelinux, linux, ipxe - get the same ips.

greg
2017-05-01 20:12
looking.


wdennis
2017-05-01 20:13
This has always worked on Cobbler, and DRP 2.9

wdennis
2017-05-01 20:13
(192.168.1.148 is IP of DRP server)

greg
2017-05-01 20:14
okay - not sure then.

greg
2017-05-01 20:15
But the assignment of the IP isn't from DRP

greg
2017-05-01 20:15
Not sure why that would change anything.


wdennis
2017-05-01 20:18
Here's how I started DRP

wdennis
2017-05-01 20:24
Doing a packet cap to see where that IP is coming from...

greg
2017-05-01 20:28
yeah - with that , we shouldn't be run anything on port 67.


wdennis
2017-05-01 20:58
Not quite sure how to interpret this...

wdennis
2017-05-01 20:59
I see the pfSense router pinging IPs .1.111, .1.123, and finally .1.104 during the PXE boot process

wdennis
2017-05-01 21:00
Those IPs correspond to IPs I see on the node while it is PXE-booting

greg
2017-05-01 21:02
You DHCP server in this trace:

greg
2017-05-01 21:02
Got a discovery and started the process by ping 1.111.

greg
2017-05-01 21:02
It then got a request, I don't know with what content.

greg
2017-05-01 21:03
It then got another discover, to which it started a ping on 1.123.

greg
2017-05-01 21:03
It timed out and responded with an Offer.

greg
2017-05-01 21:03
The offer was requested and acked.

greg
2017-05-01 21:03
60 seconds or so later. a replease on 1.123 was done and soon there after the process repeats but with 1.104.


greg
2017-05-01 21:05
Can you open the first request and check the server ip/id?

greg
2017-05-01 21:05
You may have two dhcp servers running.

wdennis
2017-05-01 21:05
Looks like the 3rd packet was a request for IP .1.111

wdennis
2017-05-01 21:07
I don't see the "Offer" packet from the pfSense box...

wdennis
2017-05-01 21:07
(Or anywhere...)

greg
2017-05-01 21:07
yeah - It isn't in the list.

greg
2017-05-01 21:08
in that first request, there should be a server id or server identifier option? That should be the IP of the DHCP server that gave the offer that request is using. Since the packet is broadcast, it is supposed to be from a broadcast offer or a very late stage renew.

wdennis
2017-05-01 21:08
Should only be one DHCP server running on this net - the pfSense router

wdennis
2017-05-01 21:08
(Which is where the packet cap came from)


wdennis
2017-05-01 21:11
DHCP server ID is 192.168.1.254, which is the router

greg
2017-05-01 21:11
that looks good.

greg
2017-05-01 21:12
On the request or discover that gets 1.104, can you look at the client identifier option (97), I think,

greg
2017-05-01 21:12
sorry 61

wdennis
2017-05-01 21:18
Now I'm pissed off enough to throw a hardware tap on the Ethernet segment to the server :rage:

wdennis
2017-05-01 21:18
We'll get all them packets yet!

greg
2017-05-01 21:18
:slightly_smiling_face:

greg
2017-05-01 21:19
Did the client identifer look different for the second set of IP assignments?

2017-05-01 21:19
Minimum "inactive" lease time is 7200 sec? I can't reduce it to debug?

2017-05-01 21:19
Or is that just a gui thing, and I can do what I want on CLI?

2017-05-01 21:19
inactive = reserved

2017-05-01 21:20
that is, once it's successfully leased.

greg
2017-05-01 21:20
oh - different person. My brain kicked in. Sorry. checking.

2017-05-01 21:20
I erased all my leases on DRP, and I'm waiting kinda patiently.

2017-05-01 21:20
:-)

greg
2017-05-01 21:20
no - reserved - is for things handed out by explicit reservation.

greg
2017-05-01 21:21
active is for unkwown things reguardless of if we keep handing the same thing back.

greg
2017-05-01 21:21
But, let me check the mins.

2017-05-01 21:21
Then perhaps I'm confused by terminology. I thought "active" meant the DHCP server is actively trying to establish that IP on the client.

2017-05-01 21:21
and "inactive" mean's it's set - reserved or dynamic.

greg
2017-05-01 21:21
no

greg
2017-05-01 21:21
bad terms possibly on our part.

greg
2017-05-01 21:22
The "Active" lease time is for the addresses in the "Active" range of the subnet. The "Active" range of the subnet is used for unknown / unreserved nodes that are trying to DHCP

greg
2017-05-01 21:23
Reserved Lease time is for entries that have explicit Rersevation objects in the database. that map a MAC to an IP.

2017-05-01 21:23
gotcha. not standard protocol definitions.

greg
2017-05-01 21:24
probably not. matches DR previous definitions.

greg
2017-05-01 21:24
7200 is hard-coded minimum on back end.

greg
2017-05-01 21:24
60 is hard-coded minimum on back end for active.

2017-05-01 21:24
Makes sense of "reserved."

2017-05-01 21:25
and 60 is fine, as long as the DHCPd doesn't timeout waiting to set the value.

greg
2017-05-01 21:25
May want to expose those as preferences at some point.

2017-05-01 21:25
because then there's possibly a race condition - between "active - unset" and "active - set"

greg
2017-05-01 21:25
Yeah - 60 is tight in my opinion, but matches what we've run

greg
2017-05-01 21:26
for DR.

2017-05-01 21:26
If my dhcp client wakes up again ($DIETY willing) and asks for and IP, what's the chance it'll get sledge or something on it? Any special magic to install centos-7?

greg
2017-05-01 21:27
undiscovered, high for sledgehammer.

2017-05-01 21:27
other than drpcli bootenvs install centos and setting it as the default?

greg
2017-05-01 21:27
Already discovered, gets what bootenv is set to.

greg
2017-05-01 21:27
not default.

2017-05-01 21:27
I see no "machiens"

greg
2017-05-01 21:28
yes default.

greg
2017-05-01 21:28
sorry

greg
2017-05-01 21:28
hmm no machiens then sledgehammer didn't finish or get loaded. Soooo, sledgehammer is the dream of a booting unknown machine.

2017-05-01 21:33
So, sledgehammer is in a dream state? nmap and ping show nothing

greg
2017-05-01 21:34
well - if the node pxes, then you should get sledgehammer. SSH should be open and root/rebar1 should allow you in, if it is the real sledgehammer.

2017-05-01 21:34
sad face

2017-05-01 21:35
http://provision.readthedocs.io/en/latest/doc/workflows.html <- nicely done.


2017-05-01 21:37
Hey guys, so I'm still trying to get the "provisioner" up. I've started over from scratch running the following: `sudo ./run-in-system.sh --deploy-admin=local --con-provisioner --con-dhcp --access=HOST --admin-ip=10.54.4.118/23` and everything seems to have worked. When I run `docker-compose ps` I can see "compose_provisioner_1" and it says it is up. But in the UX, I still do not see the "Provisioner" tab. any thoughts?

wdennis
2017-05-01 21:37
Don't understand it - I see the node do a discover, don't see an offer back, but then do a request for .1.111

greg
2017-05-01 21:39
@wdennis - yeah - I don't understand that

greg
2017-05-01 21:40
@spencerwjensen - provisioner tab requires revproxy to find the provisioner.

greg
2017-05-01 21:41
https://ip of admin node/health

greg
2017-05-01 21:41
That should show some nice things.

greg
2017-05-01 21:41
Also in the ux deployments, is the system deployment green or yellow?

wdennis
2017-05-01 21:42
@greg - maybe this is the problem??


wdennis
2017-05-01 21:44
See a bunch of these in a row, then see the node booting sledgehammer


greg
2017-05-01 21:45
so - we don't build those files.

greg
2017-05-01 21:46
We rely on IPs. We have never done that for DRP and DR hasn't done it for years.

wdennis
2017-05-01 21:47
So that's a normal part of the PXE process?

greg
2017-05-01 21:48
yeah - pxelinux walks a set of files to get its info.

wdennis
2017-05-01 21:48
Ah

greg
2017-05-01 21:48
eventually getting to pxelinux.cfg/default

wdennis
2017-05-01 21:48
Yes, I see that

greg
2017-05-01 21:49
The preceding ones are IP-based, the that mac file, and then default.

2017-05-01 21:49
from /health: ``` {"Map":{"dhcp-mgmt-service":["10.54.4.118:6755"],"dns-mgmt-service":["172.17.0.7:6754"],"rebar-api-service":["172.17.0.11:3000"],"rule-engine-service":["172.17.0.2:19202"]},"Matcher":{"dhcp-mgmt-service":"^dhcp/(.*)","dns-mgmt-service":"^dns/(.*)","rebar-api-service":"^rebar-api/(.*)","rule-engine-service":"^rule-engine/(api/.*)"},"Default":"rebar-api-service"} ```

greg
2017-05-01 21:50
@spencerwjensen, that would do it.

wdennis
2017-05-01 21:50
Well, still have NO idea where the node is getting a -.1.111 IP from...

2017-05-01 21:50
and when I check the system deployment it is actually red right now... in the past it was always yellow, never green... but now it is red because of dhcp-mgmt_service.

greg
2017-05-01 21:51
@spencerwjensen - you have a problem somewhere. Those have timeout and are red.

greg
2017-05-01 21:51
cd digitalrebar/deploy/compose

wdennis
2017-05-01 21:51
Let's boot a different node & see what happens...

greg
2017-05-01 21:51
docker-compose logs -f provisioner

greg
2017-05-01 21:51
and see what spits out if anything. It could be looping and failing.

greg
2017-05-01 21:52
Consul is up and somethings seem to have registered. So, that is good.

2017-05-01 21:53
> to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh >provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current >provisioner_1 | Dload Upload Total Spent Left Speed >100 18 100 18 0 0 3420 0 --:--:-- --:--:-- --:--:-- 3600 >provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh >provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current >provisioner_1 | Dload Upload Total Spent Left Speed > 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out

2017-05-01 21:54
it appears to be timing out trying to connect to s3?

greg
2017-05-01 21:54
yes - yes it does

greg
2017-05-01 21:54
It attempts to get sledgehammer

2017-05-01 21:55
I am behind a proxy but I have set the proxy environment variables for the system, for docker, and for yum.

2017-05-01 21:55
are there any specific proxy settings I need to set elsewhere?

greg
2017-05-01 21:55
hmmm - okay - just a minute.

greg
2017-05-01 22:00
This is a bug like thing.

greg
2017-05-01 22:01
in digitalrebar/deploy/compose

greg
2017-05-01 22:01
cat access.env

greg
2017-05-01 22:01
does it have any proxy vars?

2017-05-01 22:02
>USE_OUR_PROXY=YES >EXTERNAL_IP=10.54.4.118/23 >FORWARDER_IP= >CONSUL_JOIN=10.54.4.118 >DR_START_TIME=1493670448 >RUN_NTP=YES

greg
2017-05-01 22:02
So add to that:

greg
2017-05-01 22:02
UPSTREAM_HTTP_PROXY=$http_proxy UPSTREAM_HTTPS_PROXY=$https_proxy UPSTREAM_NO_PROXY=$no_proxy

2017-05-01 22:03
oh nice!! okay, once added do I need to restart any services?

greg
2017-05-01 22:03
Replace the $http_proxy, $https_proxy and $no_proxy with your items

greg
2017-05-01 22:03
yes - this:

greg
2017-05-01 22:03
docker-compose restart provisioner

greg
2017-05-01 22:04
Soooo what is going on and an issue to open -

greg
2017-05-01 22:05
The system canhaz a web proxy. When it does, it uses that for everything. You can specify the upstream for our proxy to use, but it tries to get it from the local system. squid/webproxy container. The problem is that the provisioner bypasses that proxy on start up to load sledgehammer and uses those upstream vars. Those aren't set when using our internal proxy.

greg
2017-05-01 22:06
So we have a start-up race condition when proxies are involved. I guess we haven't been testing it that much since we left Dell.

2017-05-01 22:06
Time to feed the :bear:!

2017-05-01 22:07
Ha ha!! That's funny! I'm at Intel, so I totally know what you mean! :-P

greg
2017-05-01 22:09
@spencerwjensen - I may want to talk with you at some point about hardware. and find out more what you are doing.

wdennis
2017-05-01 22:09
Whelp, same thing on a different node... boots 1st time to SH as expected, but then after setting the bootenv to U16.04-install, it continues to boot SH every time thereafter...

2017-05-01 22:12
Happy to chat offline! I work in the Data Center Solutions Group. Currently using Cobbler, and Ansible, amongst other tools to manage racks of servers in our labs.

greg
2017-05-01 22:14
yeah - @wdennis - I'm not sure how this ever worked for you. I think the DHCP server is working correctly, but we make this work with ours. The client identifiers are different and the server gives out different addresses unless you MAC lock them in the DHCP server.

greg
2017-05-01 22:14
You could try that if it is an option. Bind the mac to the same address all the time.

greg
2017-05-01 22:15
cobbler may be building the mac files to work with pxelinux / dhcp badness. I'll need to think about that.

2017-05-01 22:15
So I added the proxy info as you said and restarted the service but still seeing a timeout in the logs. here's what i added: >USE_OUR_PROXY=YES >EXTERNAL_IP=10.54.4.118/23 >FORWARDER_IP= >CONSUL_JOIN=10.54.4.118 >DR_START_TIME=1493670448 >RUN_NTP=YES >UPSTREAM_HTTP_PROXY=http://<url>:<port>/ >UPSTREAM_HTTPS_PROXY=http://<url>:<port>/ >UPSTREAM_NO_PROXY=<no_proxy list>

greg
2017-05-01 22:16
checking again.

greg
2017-05-01 22:19
hmm - that should have worked.

greg
2017-05-01 22:21
well - that is sad. So, we can do this instead.

greg
2017-05-01 22:21
when you run this do you get stuff?

greg
2017-05-01 22:21
ls ~/.cache/digitalrebar/tftpboot

2017-05-01 22:22
yup!

2017-05-01 22:22
>[root@master compose]# ls ~/.cache/digitalrebar/tftpboot/ >files ipxe.efi ipxe.pxe isos machines nodes pxelinux.cfg sledgehammer

2017-05-01 22:22
in the "root" directory right?

greg
2017-05-01 22:22
okay - you will need to a couple of things but give me a minute.

greg
2017-05-01 22:23
yeah - that is fine. That is mounted into the containers to avoid downloading all the time.

2017-05-01 22:23
oh gotcha!

2017-05-01 22:24
sidebar.. is this Rob? or Greg? every time you type a message I see 2 names pop up! LOL

wdennis
2017-05-01 22:24
Hah @spencerwjensen - another Cobbler/Ansible guy, nice to meet you :grinning:

2017-05-01 22:24
:-P Likewase @wdennis

greg
2017-05-01 22:25
Greg

wdennis
2017-05-01 22:25
@greg Idk, everything was working swell with DRP 2.9 on the same router... maybe something in the upgrade mangled something??

greg
2017-05-01 22:25
if you do slack, I can invite you to that instead. It is "better".

2017-05-01 22:25
LOL! yes yes! I love slack! :-)

greg
2017-05-01 22:26
maybe - but I'm not sure. @wdennis

wdennis
2017-05-01 22:26
As in, more people use Slack :wink:

2017-05-01 22:26
LOL

greg
2017-05-01 22:26
Let me check.

greg
2017-05-01 22:26
spencerwjensen I need an email.

greg
2017-05-01 22:27
@spencerwjensen: - this as root

greg
2017-05-01 22:27
``` # Get sledgehammer TFTPROOT=~/.cache/digitalrebar/tftpboot PROV_SLEDGEHAMMER_SIG=a42c8c66a60b77ca1c769b8dc7e712f6644579ed PROV_SLEDGEHAMMER_URL=http://opencrowbar.s3-website-us-east-1.amazonaws.com/sledgehammer SS_URL=$PROV_SLEDGEHAMMER_URL/$PROV_SLEDGEHAMMER_SIG SS_DIR=${TFTPROOT}/sledgehammer/$PROV_SLEDGEHAMMER_SIG mkdir -p "$SS_DIR" if [[ ! -e $SS_DIR/sha1sums ]]; then (curl -fgL -o "$SS_DIR/sha1sums" "$SS_URL/sha1sums") while read f; do (curl -fgL -o "$SS_DIR/$f" "$SS_URL/$f") done < <(awk '{print $2}' <"$SS_DIR/sha1sums") if ! (cd "$SS_DIR" && sha1sum -c sha1sums); then echo "Download of sledgehammer failed or is corrupt!" rm -f "$SS_DIR/sha1sums" exit 1 fi fi ```

wdennis
2017-05-01 22:27
I guess I'll burn it down and start over again tomorrow with a fresh copy... it's dinner time now

2017-05-01 22:27
oh right! spencerwjensen@gmail.com is my personal or spencer.w.jensen@intel.com. Either is fine to use for slack.

greg
2017-05-01 22:28
Yeah - I need to head out as well. I'll make a note to diff through some things, but not sure.

greg
2017-05-01 22:28
sent to gmail

greg
2017-05-01 22:31
I need to head home and do dad things. Back later tonight.

2017-05-01 22:31
:-) thanks for the help! That last script seems to be pulling stuff! I'll join the slack channel and talk to you later!

2017-05-01 22:31
thanks again!

greg
2017-05-01 22:32
np - @wdennis - I'll try and put out a release tonight with "fixes" for a lot your requests.

spencerj
2017-05-01 22:32
has joined #json

spencerj
2017-05-01 22:52
in my system deployment, the "dhcp-mgmt_service" role is currently red and is currently set to "null". are there any examples of what this should look like?

greg
2017-05-01 23:17
once the script finished, did you restart the provisioner?

greg
2017-05-01 23:17
@spencerj - you will need to restart the provisioner. once that finishes, you will need to retry the dhcp server role. You can do that from the annealer page in the UI (the sprially button in the top right).

wdennis
2017-05-01 23:21
Super frustrating- after totally wiping & reinstalling DRP, the nodes refuse to boot anything but SH still, even when bootenv is reset... :rage:

greg
2017-05-02 02:49
oh - sorry. I'm really certain that it is and I wonder if the lease cache in your dhcp server could be messing with it. It may not be simple to add a mac write out function. It might only work for lpxelinux.0, but that is what you are using. I think that is why we dropped it. The others didn't use it. Anyway, I'll think about.

greg
2017-05-02 02:50
@intendo - let's try and hook up. Tuesday.

wdennis
2017-05-02 04:02
@greg - I intend to fire up my old Cobbler system tomorrow (just have to get some OS installs done!) and I?ll do a packet capture on one of the nodes I install, we?ll see just what Cobbler does?

greg
2017-05-02 04:08
sorry - be back stepped. I don't know what is going on.

greg
2017-05-02 04:19
I don't see any changes in the log that would account for this, but ...

wdennis
2017-05-02 04:21
OK, digging in to my pfSense router that is serving the DHCP; i see the following 3 stanzas in the dhcpd.leases file (running ISC dhcpd 4.2.6 on FreeBSD): ``` lease 192.168.1.111 { starts 1 2017/05/01 21:23:51; ends 1 2017/05/01 21:25:26; tstp 1 2017/05/01 21:25:26; cltt 1 2017/05/01 21:23:51; binding state free; hardware ethernet 00:25:90:ed:a2:04; uid "\000\000\000\000\000\000\000\000\000\000\000\000%\220\355\242\004"; } lease 192.168.1.123 { starts 1 2017/05/01 21:24:44; ends 1 2017/05/01 21:25:15; tstp 1 2017/05/01 21:25:15; cltt 1 2017/05/01 21:24:44; binding state free; hardware ethernet 00:25:90:ed:a2:04; uid "\001\000%\220\355\242\004"; } lease 192.168.1.104 { starts 1 2017/05/01 21:25:26; ends 1 2017/05/01 23:25:26; tstp 1 2017/05/01 23:25:26; cltt 1 2017/05/01 21:25:26; binding state free; hardware ethernet 00:25:90:ed:a2:04; } ```

wdennis
2017-05-02 04:22
(I put them in the order that the IPs showed up in the PXE boot console output, which is correlated by the ?starts? time)

wdennis
2017-05-02 04:24
Not sure, but looks like only the ?uid? value (or the lack of one) is the differentiating factor? Wonder if ISC dhcpd hands out a different IP addr lease for every tuple of MAC (?hardware ethernet?) and uid value (or lack of uid value)

wdennis
2017-05-02 04:26
Do you know where the ?uid? value is getting set from - the discovery packet?

greg
2017-05-02 04:43
yes - this is what I've been saying but not well. We had this problem with ISC DHCP. We wrote our own for this. ISC has an opion that basically says ignore uid and sticks with mac only.

greg
2017-05-02 04:45
uid is option 61 in the discovery packet. The pxe prom on the nic, the kernel, isc dhcp server, and lpxelinux can and sometimes does use different ones. You have three in this case.

greg
2017-05-02 04:55
@wdennis - do you know what version of ISC DHCP they are using? I found this option and we had to use dhcp version > 4.3. ignore-client-uids in the server config file will make the server only pay attention to mac instead of client identifier.

greg
2017-05-02 05:01
Yeah - that is what we used to do when we had ISC DHCP in the mix. We built 4.3.X because it wasn't in the distros at the time, added that option, and all was good. We got tired of fighting ISC and wrote our own - rebar-dhcp. That is the basis for dr-provision.

wdennis
2017-05-02 11:06
Currently it's 4.2.6 -- but I'm running a downlevel version of pfSense at this point; perhaps a newer version has a higher ver of ISC dhcpd - I should upgrade in any case

wdennis
2017-05-02 11:08
Unfortunately on the network I'm running DRP on, I have to provide DHCP from the router...

wdennis
2017-05-02 15:15
OK, updated my pfSense router to latest, now I see this option in the DHCP server controls:


wdennis
2017-05-02 15:16
Checked it, will see what we get now...


wdennis
2017-05-02 15:31
That was it

greg
2017-05-02 15:35
Yeah!!

greg
2017-05-02 15:37
That is a bad part of the spec. I was in the IETF meetings when DHCP was being defined. We were more focused on clients getting IPs and not Servers installing.

wdennis
2017-05-02 15:46
i just read that the UID was to be used to differentiate between systems that dual-booted OS?s?

vlowther
2017-05-02 15:47
That sounds like a post-hoc justification. :slightly_smiling_face:

greg
2017-05-02 15:56
probably, but we never used DHCP for things other than clients originally. The craze at the time was clients getting on networks, and diskless "dumb" terminals. Heck we were still installing AIX from floppies at the time.

greg
2017-05-02 15:57
nobody trusted DHCP for servers that had to have a known good address.

vlowther
2017-05-02 16:00
nods.

vlowther
2017-05-02 16:01
These days, in the time of massive server farms and teardown/rebuild instead of fix-in-place, though...

greg
2017-05-02 16:06
yeah - heck the follow-spec that was worked by not done was MobileIP. It was to be a layer on top of DHCP that allowed the networking stack to have a constant IP while you roamed DHCP subnets. I started to get that working on AIX (we actually had AIX client laptops at the time), but we stopped when the spec started getting hard to implement and people started making reconnection a priority in clients. It just wasn't a problem.

spencerj
2017-05-02 16:44
Hey @greg ! restarting the services seemed to do the trick. now I have the default Network and DHCP subnet. I'm not ready to open the range to my entire lab yet. I wanted to play around a bit more with the provisioner first. is it possible to add individual MACs somewhere?

spencerj
2017-05-02 16:45
in Cobbler I could do this through the dhcp files.

vlowther
2017-05-02 16:49
hm... you looking at just storing what mac address is associated with a machine, or more looking at making sure the same mac address always gets the same address?

spencerj
2017-05-02 16:51
I'm looking at getting rid of the dhcp range temporarily and statically assigning IPs to MACs.. If it works the same as cobbler, the provisioner should ONLY do something if DHCP request comes from a machine with a known MAC.

spencerj
2017-05-02 16:51
does that make sense?

vlowther
2017-05-02 16:51
Yes, :slightly_smiling_face:

vlowther
2017-05-02 16:53
The UI does not have support for it yet, but you are looking for DHCP reservations.

greg
2017-05-02 16:54
@spencerj - Remind me, are you using DR or DRP?

greg
2017-05-02 16:54
different commands for each.

spencerj
2017-05-02 16:54
LOL... I think just DR.. I'm not sure the difference though.

greg
2017-05-02 16:56
I think DR as well from what you told me earlier.

spencerj
2017-05-02 16:58
okay :slightly_smiling_face: so you have to set the reservations from the CLI?

greg
2017-05-02 17:05
DR is less like cobbler than that, but ...

spencerj
2017-05-02 17:14
oh okay, but are reservations possible?

greg
2017-05-02 17:17
Yeah - thinking about how to do it.

greg
2017-05-02 17:18
umm - @spencerj - did you ever tell me what your goal is with DR and/or DRP?

spencerj
2017-05-02 17:19
I don't think so! :slightly_smiling_face:

greg
2017-05-02 17:19
What I mean is - do you just want a cobbler replacement. Set a mac/IP map and install os, walk away? Or do you want workload orchestration, IPMI management, eventing, post install configuration in orchestrated way.

spencerj
2017-05-02 17:21
currently we just have the "Cobbler" piece were we have a MAC/IP mapping. known or "registered" systems can boot and get the PXE payload to build whatever profile (OS + Kickstarts) we assigned..

spencerj
2017-05-02 17:21
then AFTER the cobbler process we are manually kicking off the "configuration" stuff (Ansible).

spencerj
2017-05-02 17:24
we also have IPMI/BMC on all of our machines but again.. each of these pieces is sort of separate from one another. we have scripts for IPMI automation... scripts for Cobbler automation, and more scripts for Ansible automation.. but nothing really tying them together very well.

spencerj
2017-05-02 17:25
In a past life I was a TeamCity admin so I was starting to look at some options there with a TeamCity/Jenkins type setup to manage the scheduling/event side of things.. but then I struck just the right search query into Google and Digital Rebar popped up.

spencerj
2017-05-02 17:26
I kid you not, in ALL my past Googling, searching high and low for bare metal provisioning solutions, I never found DR.

greg
2017-05-02 17:26
we are finally trying to do that better.

spencerj
2017-05-02 17:26
oddly enough, it was actually a search string of "cobbler docker" that did the trick and I found some blog post that Rob had written! LOL

greg
2017-05-02 17:28
okay - good to know. We can do all of that over time. The challenge is that DR is opinionated on how is should be used. We are trying to undo that with DRP and slowly in DR.

greg
2017-05-02 17:28
We aren't there yet.

greg
2017-05-02 17:28
By default DR wants to manage tightly IPs through DHCP.

spencerj
2017-05-02 17:29
So the "Provisioner" in DR is not the same as DRP?

greg
2017-05-02 17:30
no - that is a coming change. DR's provisioner is the basis for DRP, but they are not the same. They have similar pieces, but DRPs is much more fleshed out, documented, and directly control able.

spencerj
2017-05-02 17:32
ohhhhh... okay.. I misread the docs then.... I thought DRP was standalone but was also the "Provisioning" piece in DR..

spencerj
2017-05-02 17:32
that makes sense now! :slightly_smiling_face:

spencerj
2017-05-02 17:33
So.. based on what I've said so far then.. should I start with DRP?

greg
2017-05-02 17:35
well - it matches your current model better.

spencerj
2017-05-02 17:36
our environment is very "fungible".. constantly rebuilding and re-purposing HW... but all of the systems are "known", as in we want to control the IP assignment and avoid "rogue" systems building.

greg
2017-05-02 17:36
You have IPMI system, Provsion system, other system.

greg
2017-05-02 17:36
DRP could be the Provision System

greg
2017-05-02 17:36
I'm working to hook it into DR. To drive events and such.

greg
2017-05-02 17:36
DRP may be easier to play with it.

greg
2017-05-02 17:37
It has a Reservation system. You define MAC->IP reservations.

greg
2017-05-02 17:37
You then create Machines that map IPs->BootEnvs.

greg
2017-05-02 17:37
BootEnvs are the OSes you want to install.

greg
2017-05-02 17:37
The BootEnvs are like cobbler kickstart.

greg
2017-05-02 17:38
The machines have parameters and/or profiles that all you to inject information globally or per node into the kickstarts.

greg
2017-05-02 17:38
All with CLI or API or UI.

greg
2017-05-02 17:39
The kickstarts are templated so that you can update machine values and chain bootenvs.


spencerj
2017-05-02 17:39
does DRP still use sledgehammer for discovery?

greg
2017-05-02 17:39
I've been docing like fiend.

greg
2017-05-02 17:39
yes

spencerj
2017-05-02 17:39
ha ha ha!


spencerj
2017-05-02 17:40
okay! and do the BootEnvs allow you to hook in "post" OS stuff? like Ansible Roles?

greg
2017-05-02 17:40
wjat

greg
2017-05-02 17:40
What you've described is a subnet for base options set to reserved only.

greg
2017-05-02 17:41
THen a bunch of reservations for your IP>MAC maps. This could include IPMI if you want to run DHCP bmcs, but ....

greg
2017-05-02 17:41
Then create a machine (or let sledgehammer discover and create).

greg
2017-05-02 17:41
Set the bootenv you want.

spencerj
2017-05-02 17:44
okay! and does DRP use PXE or iPXE? or both? is one defaulted?

greg
2017-05-02 17:44
by default, it serves files of lpxelinux.0, ipxe, and bootefi.*.

spencerj
2017-05-02 17:44
oh swet!

greg
2017-05-02 17:44
The image you use, is dependent upon what you want to boot from or some magic.

spencerj
2017-05-02 17:44
*sweet!


greg
2017-05-02 17:45
The options in the DHCP server are templated.

greg
2017-05-02 17:46
You can do this insanity: ```{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}} ```

greg
2017-05-02 17:46
which says check option 77 for iPXE, use default.ipxe as the bootloader, if option 93 is 0 then use lpxelinux.0 otherwise use bootx64.efi

greg
2017-05-02 17:46
That way we handle ipxe, legacy bios, and uefi

greg
2017-05-02 17:47
Though the default is just: lpxelinux.0

greg
2017-05-02 17:47
because simple

spencerj
2017-05-02 17:47
LOL!

greg
2017-05-02 17:48
in theory, you could use the same thing to jump to arm or 32bit or whatever.

greg
2017-05-02 17:49
though I suppose ARM is a bad word for you. :slightly_smiling_face:

spencerj
2017-05-02 17:53
ha ha ha!

spencerj
2017-05-02 17:53
It's okay. I forgive you for that profanity. :stuck_out_tongue_winking_eye:

spencerj
2017-05-02 18:30
once I have the DRP code, how should I run the installer? I'm looking at the install docs for "install.sh". Are there only two ways to run? with or without --isolated?

wdennis
2017-05-02 18:36
@spencerj I used the curl cmd to download the installer bundle, then ran the install.sh myself when I had vetted it. As I?m running a demo of it, I did create a ?drp? directory, put the install stuff there, and then ran the installer with ?--isolated? which keeps all the things within the dir you ran the installer in.

spencerj
2017-05-02 18:45
okay.. so without the --isolated flag it will do a "normal" install?

spencerj
2017-05-02 19:16
So I just ran the install.sh script and got this error: ``` cp: cannot stat ?assets/startup/dr-provision.service?: No such file or directory ```

spencerj
2017-05-02 19:16
when I `ls` that directory this is what I see:

spencerj
2017-05-02 19:16
``` [infra@master drp-install]$ ls assets/startup/ rocketskates.service rocketskates.sysv rocketskates.unit ```

greg
2017-05-02 19:17
Oops.

greg
2017-05-02 19:17
Fixing

spencerj
2017-05-02 19:18
ha ha ha!

2017-05-02 20:00
drp in-place updates preserve configs and data? just delete the sha file?

2017-05-02 20:02
for --isolated

greg
2017-05-02 20:04
drp in place creates a drp-data directory and shows the options to run that way.

greg
2017-05-02 20:05
To have install.sh pull an updated zip file from github, remove the sha file and re-run install.

2017-05-02 20:05
groovy.

2017-05-02 20:05
that's exactly what I meant. :-)

lae
2017-05-02 20:15
oh funny, spencer's story is literally my story (was using cobbler/ansible/in house scripts for ipmi and stuff, suddenly found DR and then DRP was made public afterwards)

greg
2017-05-02 20:18
:slightly_smiling_face: @vlowther and I are working on a plan to get IPMI/RAID/BIOS back into the fray and join DRP machines into DR as nodes.

2017-05-02 22:18
I'm confused which should be defaultBootEnv and unknownBootEvn. unknownBootEnv=discovery and defaultBootEnv=sledge?

2017-05-02 22:22
nevemrind - found it in the Fine Documentation

greg
2017-05-02 22:23
Fine - as in wine - as in drunken ramblings.

2017-05-03 00:00
tftpd not responding...

2017-05-03 00:00
it's listening on udp:69

2017-05-03 00:03
if the mac is in the server's arp table as associated with an ip already, accisble through a server interface that's not listening tftpd, will it fail to resopnd to tfpd udp?

2017-05-03 00:06
the same mac is showing as known by the other interface that's actually running DRP

greg
2017-05-03 00:07
Trees falling. Noisily or not. Hmm. I think you are using DR. Are you in HOST or FORWARDER mode? If you didn't specify anything then you are in FORWARDER mode. The implication is that only thinks on docker0 will communicate with DR services. You could bridge into that network. If host mode then all interfaces are in play and will respond.

2017-05-03 00:08
DRP - no docker

2017-05-03 00:08
no bridges

2017-05-03 00:12
wat? why would ncat say 'no route to host' when ping sees the IP just fine?

2017-05-03 00:16
ah, iptables - no accepto port udp:69

2017-05-03 00:17
et voir la!

2017-05-03 00:27
dr-provision2017/05/03 00:24:22.650999 DHCP handler died: write udp4 0.0.0.0:67->192.168.1.51:68: i/o timeout

2017-05-03 00:27
drp process died.

2017-05-03 00:30
any idea why a dell R410 would ignore and not reboot PXE when this returns OK? ipmitool chassis bootdev pxe; ipmitool chassis power cycle

vlowther
2017-05-03 01:17
latest DRP code cannot fail in that way -- we no longer rely on timeouts to cleanly release the DHCP socket to work around the darwin kernel's lack of ability to clean up UDP sockets belonging to nonexistent processes.

vlowther
2017-05-03 01:17
yes, really.

vlowther
2017-05-03 01:19
re: r410: because all IPMI firmware sucks.

vlowther
2017-05-03 01:20
messing with bootdev order via ipmi is something I basically assume will fail silently until I am happily proven wrong.

2017-05-03 01:21
lolz

vlowther
2017-05-03 01:23
tl:dr: never kill -9 processes with open listening UDP ports on a mac, unless you like rebooting.

2017-05-03 01:23
jeez.

2017-05-03 01:24
Is there any way to remotely set the bootdev? I see an entry for dell related stuffs in the ipmitool manual.

vlowther
2017-05-03 01:25
Yes, the IPMI standard has those methods, and some IPMI firmware even implements it properly.

vlowther
2017-05-03 01:26
I don't know if the idrac on a 410 is one of those --- it has been many years and more ethanol since I tried. \

vlowther
2017-05-03 01:27
If you have a copy of racadm compatible with that box, it is probably a better bet than ipmiitool.

2017-05-03 01:48
Once DRP finds a box and drops sledgehammer on it, shouldn't it be a "machine"? Or are those only for predefined machines?

2017-05-03 02:02
Because it didn't give me a "machine" object to play around with.

greg
2017-05-03 02:10
It should create a machine.

greg
2017-05-03 02:11
My guess is that your subnet is missing some parameters that sledgehammer needs. I tried to put this in the subnet config page, I think.

greg
2017-05-03 02:11
You can get the error log byL

greg
2017-05-03 02:11
logging into sledgehammer - root/rebar1

greg
2017-05-03 02:11
journalctl -u sledgehammer

greg
2017-05-03 02:12
That should have a clue as what is missing or busted.

greg
2017-05-03 02:12
The darwin hack sadly makes it better, but occassionally it hits it still. It is better though. :neutral_face:

2017-05-03 02:13
already rebooted. my idrac access had annoying keyboard issues.. because mac->x2go->xfce->chrome->javaws->console.

2017-05-03 02:14
Does sledgehammer open the serial console? My ssh attempts seemed to require a key.

2017-05-03 02:24
d

greg
2017-05-03 02:30
d?

greg
2017-05-03 02:31
@newgoliath - it should, I think. ssh requires a key. Let me check something.

greg
2017-05-03 02:32
That is a good feature request to put back.

greg
2017-05-03 02:32
actually, some template in template love should fix this nicely.

greg
2017-05-03 02:33
I cleaned too much.

greg
2017-05-03 02:33
It should have a console, but I can put in the root-remote-access.tmpl and it should work in sledgehammer.

greg
2017-05-03 02:33
and discovery.

greg
2017-05-03 02:34
sooo - you can put {{ template "root-remote-access.tmpl" . }} in the sledgehammer start-up.sh template and it will enable ssh with keys in the access_keys parameter in the machine profile, profiles assigned to the machine, or the global profile.

greg
2017-05-03 02:34
I'll make that change shortly.

greg
2017-05-03 02:35
I need to cut a release, we've amassed some good changes.

2017-05-03 02:40
I'm fading... sleepy...

spencerj
2017-05-03 02:43
In my experience `chassis power cycle` reboots everything including the BMC so any temp flags are lost. After the `bootdev pxe` command I usually do a `chassis power reset` which seems to be more synonymous with a hard reboot, but BMC stays alive so the pxe flag stays.

greg
2017-05-03 02:49
sleep my friend and dream of large women.

2017-05-03 02:50
struggling to remember the film you reference.

greg
2017-05-03 02:51
Princess Bride

greg
2017-05-03 02:51
I do not envy you your headache when you wake up.

greg
2017-05-03 02:51
My brain is a little off right now.

2017-05-03 02:51
hee hee.

greg
2017-05-03 06:17

greg
2017-05-03 06:17
Stables updated.


2017-05-03 13:42
grabbing and installing

2017-05-03 13:49
dr-provision2017/05/03 13:49:00.327087 Received option: OptionClientIdentifier: ??K?

greg
2017-05-03 13:51
yeah - some things send CID that aren't printable.

2017-05-03 14:00
interesting drp is responding dhcp on both my interfaces.

2017-05-03 14:00
sledge keeps requesting, over and over.

2017-05-03 14:01
every 30ish seconds

greg
2017-05-03 14:02
Yes - that is right. The lease time is 60 seconds. so renew time 30 seconds.

greg
2017-05-03 14:03
You can change that in the subnet definition.

2017-05-03 14:03
nmap says up, but all ports closed.

2017-05-03 14:03
I see your {{ template "root-remote-access.tmpl" . }} in the sledge.yaml

2017-05-03 14:03
Did bootenvs need to be reloaded, to re-run templates?

2017-05-03 14:05
no serial console ouput (but it's at least connecting)

greg
2017-05-03 14:07
Yes - templates needs to be reloaded.

greg
2017-05-03 14:08
So do bootenvs.

2017-05-03 14:08
okdee doke.. thanks Greg!

greg
2017-05-03 14:08
Templates are easy. I have a bug I'm still working on to update bootenvs.

greg
2017-05-03 14:08
It may be easiest to:

greg
2017-05-03 14:08
stop dr-provision

greg
2017-05-03 14:08
rm -f drp-data/digitalrebar/bootenvs

greg
2017-05-03 14:08
rm -f drp-data/digitalrebar/templates

greg
2017-05-03 14:09
Then rerun the tools/discovery-load

greg
2017-05-03 14:09
and all the otehr bootenvs install commands from before.

greg
2017-05-03 14:09
The bootenvs update is a real issue.

2017-05-03 14:10
roger.

2017-05-03 14:12
should I bootenvs destroy before bootenvs install?

2017-05-03 14:21
chicken-egg problem with dr-provision and discovery-load ----- after the rm -f commands.

2017-05-03 14:21
:(

2017-05-03 14:22
I guess gotta clear out the configs for the default boot envs.

greg
2017-05-03 14:59
ugh - sorry. yeah prefs. will get in the way.

greg
2017-05-03 14:59
I should know better. Get fix the bugs instead of cludging around them .

2017-05-03 15:02
I can't find out to force start without the discovery

greg
2017-05-03 15:04
rm -rf drp-data/digitalrebar/preferences/*

greg
2017-05-03 15:05
then reset them in the UI once all is loaded. Sorry.

2017-05-03 15:05
OH! DUH! there they are.

2017-05-03 15:46
Did you miss updating the version string in the executable?

2017-05-03 15:46
dr-provision2017/05/03 15:45:03.939046 Version: v3.0.1-tip-38-10d0a97fe90d2104bf9c2e7720529496fac4c033

2017-05-03 15:46
or did I not delete all the rigth stuff?

greg
2017-05-03 15:50
It should have automatically been updated

2017-05-03 15:50
nevermind.. I didn't pull the versioned one.

greg
2017-05-03 15:51
The next tip build will be relative to 3.0.2

2017-05-03 16:30
the client gets sledge and the first script is run, but it seems control.sh never happens.

2017-05-03 16:33
does it go out to the Internet? Because I've got no NAT running on my private IP address range that I'm using for DHCP.

greg
2017-05-03 16:36
it should not need internet, I think.

greg
2017-05-03 16:36
can you get on sledgehammer?

greg
2017-05-03 16:36
what does journalctl -u sledgehammer show

2017-05-03 16:37
no route to host.

2017-05-03 16:37
it gets an IP address and shuts up.

2017-05-03 16:37
no more dhcp requests.

greg
2017-05-03 16:39
hmm - what IP did you use for the static-ip on the dr-provision line. It should probably be the internal IP of your admin system.

2017-05-03 16:39
yup, i have em1 (public IP) em2 (192.168.1.1)

2017-05-03 16:40
./dr-provision --static-ip=192.168.1.1 --file-root=/root/drp/dr-provision-install/drp-data/tftpboot --data-root=drp-data/digitalrebar --debug-bootenv=2 --debug-dhcp=2 --debug-renderer=2 --dhcp-ifs=em2

2017-05-03 16:42
it pxes to tftp and gets the sledge bits. then the logs dump out a script. then nothing.

2017-05-03 16:42
I can reset with IPMI.

2017-05-03 16:44
dr-provision2017/05/03 16:29:51.742235 Rendering start-up.sh for All booting discovery dr-provision2017/05/03 16:29:51.742465 Content: #!/bin/bash export PS4='${BASH_SOURCE}@${LINENO}(${FUNCNAME[0]}): ' set -x set -e

2017-05-03 16:44
then I never see anything again.

greg
2017-05-03 16:44
Do you get a machine object?

greg
2017-05-03 16:45
wow.

2017-05-03 16:45
nope.

2017-05-03 16:45
it dumps out the whole script.. up to the echo "Did not get control.sh..."

2017-05-03 16:46
the script was not dumped to console, it's in logs.

greg
2017-05-03 16:46
okay - whew - I was worried.

2017-05-03 16:46
Should I ping the subnet to see if anything sprung to life?

greg
2017-05-03 16:47
yeah - see if the node is reachable, but doesn't seem likely.

2017-05-03 16:47
because the IP that it got DHCP is truly down. nmap says so.

2017-05-03 16:47
? (192.168.1.52) at <incomplete> on em2

2017-05-03 16:48
^ arp -a

greg
2017-05-03 16:48
yeah

greg
2017-05-03 16:50
it is like your boot dev isn't set.

2017-05-03 16:51
hmm.. interesting.

greg
2017-05-03 16:52
the pxelinux.cfg/default should have IPAPPEND 2 on it.

2017-05-03 16:55
ls ./drp-data/tftpboot/pxelinux.cfg/ <- empty

greg
2017-05-03 16:55

2017-05-03 17:01
curl http://192.168.1.1:8091/pxelinux.cfg/default DEFAULT discovery PROMPT 0 TIMEOUT 10 LABEL discovery KERNEL sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/vmlinuz0 INITRD sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/stage1.img APPEND rootflags=loop root=live:/sledgehammer.iso rootfstype=auto ro liveimg rd_NO_LUKS rd_NO_MD rd_NO_DM provisioner.web=http://192.168.1.1:8091 rs.api=https://192.168.1.1:8092 IPAPPEND 2

2017-05-03 17:01
OK... trying from remote host.

2017-05-03 17:02
same

greg
2017-05-03 17:02
looks good

2017-05-03 17:03
and the swagger API is accessible

2017-05-03 17:03
curl -fsSLk https://192.168.1.1:8092/

2017-05-03 17:03
OK

greg
2017-05-03 17:04
drpcli profiles show global

greg
2017-05-03 17:04
anything in those?

2017-05-03 17:04
just the name.

greg
2017-05-03 17:05
ok

greg
2017-05-03 17:05
hmmm - maybe me.

2017-05-03 17:05
I see your greg key in access_keys

2017-05-03 17:05
but whatever, the machine isn't even online.

greg
2017-05-03 17:05
well , if the script breaks it could do that.

greg
2017-05-03 17:06
profiles has something in it? or the examples profiles have something in it.

2017-05-03 17:06
drpcli profiles show global { "Name": "global" }

2017-05-03 17:06
drpcli profiles list [ { "Name": "global" } ]

greg
2017-05-03 17:07
ok

greg
2017-05-03 17:08
Let me test something - i forgot

2017-05-03 17:11
I'm fine to kill and rm -rf the whole thing.

greg
2017-05-03 17:12
Try that and restart with new. Make sure all the templates are aligned.

greg
2017-05-03 17:12
I'm concerned I didn't test sledgehammer/discovery without access_keys.

greg
2017-05-03 17:27
testing now - may have found something.

greg
2017-05-03 17:35
nope - my bad. on the virtualbox setup.

2017-05-03 17:44
script ran, asking for DHCP again - good progress.

greg
2017-05-03 17:46
Just published v3.0.3 - it has a fix for bootenv updating. It is a cli change only.

2017-05-03 17:48
Goodness gracious, there a machine!

greg
2017-05-03 17:48
:slightly_smiling_face:

greg
2017-05-03 17:49
If you add the access_keys parameter to the global profile like in the example, you should be able to ssh into sledgehammer (after a reboot).

2017-05-03 17:54
assets/profiles/root-access.yaml being the example?

wdennis
2017-05-03 18:19
@greg <offtopic>Here in olde Austin-towne? Anyplace I should try to go and eat at tonight?

greg
2017-05-03 18:23
Location and transportation info needed. BBQ?

wdennis
2017-05-03 18:24
Loc: downtown (Hampton Inn Univ.) transportation: RideAustin or other. Food: Tx-Mex, Mex or BBQ - don?t care as long as it?s tasty!

greg
2017-05-03 18:26
Thinking

greg
2017-05-03 18:39
soo - Victor and I were discussing -

greg
2017-05-03 18:43
@wdennis BBQ - Salt Lick (North in Round Rock), Iron Works (1st and RedRiver just off IH-35 (downtown)), Stubb's 8th/RedRiver (sometimes has live music)

greg
2017-05-03 18:43
MEX - is harder - Manuel's - Not my normal.

wdennis
2017-05-03 18:44
@greg Thx, looking to get my noms on while here :yum:

greg
2017-05-03 18:45
Tex-Mex - Chuy's and/or Trudy's - Chuy's is good food (chain, but started here). One on North Lamar is not to far from you. The one on riverside is neat because you can walk to Zilker park or other venues.

greg
2017-05-03 18:45
Stubb's is neat because it is near sixth street for music and bars. If that is your thing.

greg
2017-05-03 18:46
Iron works is similar location wise just.

wdennis
2017-05-03 18:46
Music yes, bars not anymore :wink:

wdennis
2017-05-03 18:49
(Except if they have music)

greg
2017-05-03 18:49
most bars have some music especially downtown.

wdennis
2017-05-03 18:52
Loving being back in my home state (born in Houston, lived in El Paso for my early years)

2017-05-03 18:53
Hi rebars

2017-05-03 18:54
Just need a quick help here

2017-05-03 18:54
I am trying to deply openstack workload on DR

2017-05-03 18:54
But it fails at the last step on openstak-deploy

2017-05-03 18:54
with error Downloading common from repo http://localhost:8879/charts helm lint nova ==> Linting nova [INFO] Chart.yaml: icon is recommended [ERROR] templates/: render error in "nova/templates/daemonset-libvirt.yaml": template: nova/templates/daemonset-libvirt.yaml:14:62: executing "nova/templates/daemonset-libvirt.yaml" at <include "hash">: error calling include: template: nova/charts/common/templates/_funcs.tpl:22:4: executing "hash" at <include $wtf $contex...>: error calling include: template: nova/templates/configmap-etc.yaml:9:56: executing "nova/templates/configmap-etc.yaml" at <include "template">: error calling include: template: nova/charts/common/templates/_funcs.tpl:12:3: executing "template" at <include $wtf $contex...>: error calling include: template: no template "nova/templates/etc/_ceph.client.cinder.keyring.yaml.tpl" associated with template "gotpl" Error: 1 chart(s) linted, 1 chart(s) failed Makefile:51: recipe for target 'build-nova' failed

2017-05-03 18:55
any help will be highyl appreciated

2017-05-03 18:55
its for a school project :)

2017-05-03 18:55
and while we are on it, try Tres-Leches from Chuy's

wdennis
2017-05-03 18:56
@ayush37 :+1:

2017-05-03 18:57
@zehicle is there any assistance regarding the error I am facing, if fixed I owe you a Tres-Leches

greg
2017-05-03 18:57
@ayush37 - not sure. I haven't tried it 3 mos or so. The upstream may have moved.

2017-05-03 18:58
@zehicle Thanks, shall try fixing it by some means

greg
2017-05-03 18:58
I'm refreshing my memory.

greg
2017-05-03 18:58
This is Greg, it is confusing.

2017-05-03 18:59
yeah it is

greg
2017-05-03 18:59
My current guess is that the upstream has moved on. We do a git clone:

greg
2017-05-03 18:59

greg
2017-05-03 18:59
but that is my repo. So - maybe not. Still could have atrophied.

greg
2017-05-03 19:00
What school project?

2017-05-03 19:00
I am a masters student at UT Dallas, its a research project

greg
2017-05-03 19:00
Hmm - also could be 1.6.1 update.

2017-05-03 19:00
where my professir wants me to try out Digital Rebar

greg
2017-05-03 19:00
nice - okay

greg
2017-05-03 19:01
Interesting - would like to know what for? You may want to stick to more supported things. :slightly_smiling_face:

greg
2017-05-03 19:01
What version of k8s did you install?

greg
2017-05-03 19:01
1.6.1 or 1.5.3 - it will depend upon when you started using DR.

greg
2017-05-03 19:01
It is possible that you are using a too new k8s for the openstack stuff.

2017-05-03 19:01
actually, we are woryeah maybe

2017-05-03 19:01
yeah maybe

2017-05-03 19:02
let me check the k8 installation

2017-05-03 19:02
regaring the project, we are working with Ericsson for some POC

2017-05-03 19:02
which involves the use of digital Rebar

greg
2017-05-03 19:02
One option is to "redeploy" after changing the version 1.5.3.

greg
2017-05-03 19:02
For OpenStack or general machine management?

2017-05-03 19:02
for OpenStack only

2017-05-03 19:03
I tried to deploy the openstack workload from GUI

greg
2017-05-03 19:03
Okay- so you are looking for a quick openstack and DR might help with that.

2017-05-03 19:03
of the rebar

2017-05-03 19:03
exactly

2017-05-03 19:04
let me try with the older version of K8

2017-05-03 19:04
thanks a lot

greg
2017-05-03 19:05
Yeah - so , you can change the options - reset things - then don't have the workload autocommit. Go into the deployment and find hte k8s-config role/service (right corner of deployment). Edit k8s version to 1.5.3. Then commit the deployment.

greg
2017-05-03 19:05
Also, you have DR working? Without talking to us? How?

greg
2017-05-03 19:05
:slightly_smiling_face:

2017-05-03 19:05
hahaha, it was pretty self-explanatory the website

2017-05-03 19:05
its really kind of you guys to make it so lucid

greg
2017-05-03 19:06
You must think like I do, which is rare. I'm sorry for you.

greg
2017-05-03 19:06
or Victor. Still sorry for you.

2017-05-03 19:06
if this works, I may get a "A" grade in my subject

2017-05-03 19:06
CLoud COmputing

2017-05-03 19:06
:D

greg
2017-05-03 19:07
ok - keep me posted. I'm interested.

2017-05-03 19:07
with a Tres-Leches cake from myside to the DR team

2017-05-03 19:07
sure i wll

2017-05-03 19:10
I started the deployment with 1.5.3

greg
2017-05-03 19:12
okay - so it was already a t 1.5.3

2017-05-03 19:13
it was at 1.6.1

greg
2017-05-03 19:13
so - restarted. - let's see what happens.

greg
2017-05-03 19:14
WIth regard to Tres-Leches, I never make that far. Between the Queso, chips, and meal, I'm usually ready to explode. :slightly_smiling_face:

2017-05-03 19:15
yeah !! exactly, so I always ask the server to make the cake as togo

2017-05-03 19:15
once I am home, there no one stopping me

2017-05-03 19:15
from pounding on it

greg
2017-05-03 19:15
:slightly_smiling_face: planning and gluttony - it is a beautiful thing

2017-05-03 19:16
its like on of the sevens sins, executed perfectly

2017-05-03 19:19
getting back to DR fiasco: its running well and currently perfroming the etcs-install

2017-05-03 19:19
etcd*

2017-05-03 19:50
Unofrtunately failed at k8s-dns

2017-05-03 19:50
Err: [WARNING]: Consider using yum, dnf or zypper module rather than running rpm Backtrace: /opt/digitalrebar/core/rails/app/models/jig.rb:52:in `die' /opt/digitalrebar/core/rails/app/models/barclamp_rebar/ansible_playbook_jig.rb:290:in `run' /opt/digitalrebar/core/rails/app/models/run.rb:79:in `run' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/job.rb:15:in `_run' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/job.rb:100:in `block in work' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/adapters/active_record.rb:5:in `block in checkout' /var/cache/rebar/gems/ruby/2.1.0/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/connection_pool.rb:292:in `with_connection' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/adapters/active_record.rb:48:in `checkout_activerecord_adapter' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/adapters/active_record.rb:5:in `checkout' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/job.rb:83:in `work' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:78:in `block in work_loop' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:73:in `loop' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:73:in `work_loop' /var/cache/rebar/gems/ruby/2.1.0/gems/que-0.11.2/lib/que/worker.rb:17:in `block in initialize'

2017-05-03 19:51
@zehicle Now i am trying as Ubuntu 16.04 as OS, prev deployment was on Centos 7.2

2017-05-03 20:22
unfortunately failed at same step with k8s version 1.5.3

greg
2017-05-03 20:23
hmm - okay - I'll add it to my queue to retry it myself.

greg
2017-05-03 20:24
Can you describe your environment so I can get close to it?

2017-05-03 20:24
yes it is on AWS Ubuntu 16.04 LTS as Rebar server

greg
2017-05-03 20:24
What nodes?

greg
2017-05-03 20:25
AWS instances?

greg
2017-05-03 20:25
from the aws provider?

2017-05-03 20:25
t2.xlarge

2017-05-03 20:26
So if I try with K8s version 1.5.3 it fails at k8s dns step, and if the version is 1.6.1 it fails at last step OPenstack-deployment

2017-05-03 20:26
thanks a lot @zehicle for you consideration, highly appreciated

greg
2017-05-03 20:27
yeah - I need to see what is failing . "Something" has changed. I may not get to this until tomorrow.

2017-05-03 20:27
yes I understand, thanks again for your effortm, I shall also try debugging it, if I find a fix shall post you

2017-05-03 20:28
till then, may the force be with you

greg
2017-05-03 20:31
:slightly_smiling_face:

zehicle
2017-05-03 21:07
I'd suggest http://Packet.net over AWS for Openstack tests.

lae
2017-05-03 21:13
there's a rackn code iirc

zehicle
2017-05-03 22:56
Yes, RACKN100 - I think it gets $25 credit or so

lae
2017-05-03 23:10
I think it may actually have been $100?

zehicle
2017-05-03 23:13
it was originally...

zehicle
2017-05-03 23:14
about a year ago. early adopter benefits :slightly_smiling_face:

lae
2017-05-03 23:15
Ah.

lae
2017-05-03 23:16
Guess I got in early enough, haha.

2017-05-04 16:44
@zehicle haha, was working late night, guess i will try with Packet.net in that case

2017-05-04 16:45
thansks for the update though, have a great day!!

zehicle
2017-05-04 16:48
in a few minutes, RackN is going to be posting (twitter, linkedin, facebook, web) a lot about DRP. If you've been having good time with it, please help us get the word out about it! Thanks.

2017-05-04 16:49
sure thing, will spread it in UT Dallas and Ericsson

lae
2017-05-04 16:49
:+1:

2017-05-05 01:41
hey @zehicle I tried all the permutations and combinations of the k8's version on AWS and Packet, the issues is still the same, it gets stuck at the very last step of Openstack-Deployment, with an error depictin issue in "helm lint nova", Please let me know if there is a fix as Ericsson shall be really interested in implementing this for expedited OPenstack deployment :smile:

2017-05-05 01:42
clarification for k8s version 1.6.1 it goes all the way to last step, for all other versions 1.51, 1.5.3, 1.5.7 it fails at K8 DNS

greg
2017-05-05 03:09
okay -

zehicle
2017-05-05 14:45
@Ayush37 if there is commercial interest, then let's talk about that 1x1. I'm at the OpenStack summit next week. This work is exciting but early and will need some sustaining effort.

2017-05-05 14:59
Hey @zehicle , Regarding the commercial interest, I shall pass on the information to my manager as I am an intern and not in a position to make a decision

zehicle
2017-05-05 15:10
We are happy to get on a call and discuss with interested parties. OpenStack Helm is an active project and week require ongoing support to maintain.

wdennis
2017-05-05 23:54
Good to meet the DR crew at DevOpsDays Austin - best DOD I've yet attended!

greg
2017-05-06 03:36
It was good! Glad to meet you too!

greg
2017-05-08 13:58
@chermack - this is the community

chermack
2017-05-08 13:59
has joined #json


wdennis
2017-05-08 16:55
@greg Any idea why I'd be getting this error?

greg
2017-05-08 16:58
Yes - change the 127.0.0.1 in the bar to match your ip address in the overall nav bar.

greg
2017-05-08 16:58
@wdennis

wdennis
2017-05-08 17:01
Why does the Swagger UI assume 127.0.0.1?

greg
2017-05-08 17:02
well - umm - you see - umm - lazy me. It has to with the fact it isn't very integrated into our system.

wdennis
2017-05-08 17:02
Ah

wdennis
2017-05-08 17:03
Picking your battles eh? :wink:

greg
2017-05-08 17:03
We use the swagger-ui without any changes (except to reference that URL). I'd have to change it is a little more to make it more dynamic.

greg
2017-05-08 17:03
We also want to switch which one we use.

wdennis
2017-05-08 17:04
:+1::skin-tone-2:

wdennis
2017-05-08 17:23
@greg Are the .yaml files in assets/profiles automatically loaded, or do they have to be loaded via drpcli ??

greg
2017-05-08 17:23
you have to load them. They are examples ,because you usually have to change value.s

zehicle
2017-05-08 18:53
RE swagger-ui -> there's a new version based on react that we'd switch to also

zehicle
2017-05-08 18:53
the rest of the UX is based on react

zehicle
2017-05-08 18:53
DRP UX

wdennis
2017-05-08 23:30
@greg No helper cmd yet in drpcli to associate profiles with machines?

greg
2017-05-08 23:41
Not yet - soon - maybe tonight.

greg
2017-05-08 23:41
drpcli machines update <uuid> '{ "Profiles": [ "prof1", "prof2" ] }'

greg
2017-05-08 23:42
will do it. WARNING- the UX will eat profiles if you set bootenv. It is not implemented quite right.

wdennis
2017-05-08 23:44
You mean if you change the machine?s bootenv thru the UX?

greg
2017-05-08 23:45
yes

greg
2017-05-08 23:45
I noticed that today.

wdennis
2017-05-08 23:46
Thx for that tip, that was my next move :slightly_smiling_face:

greg
2017-05-08 23:46
drpcli machines bootenv <uuid> <bootenv>

greg
2017-05-08 23:47
works though

wdennis
2017-05-08 23:47
So, do I use ?drpcli machines update ?? to change the bootenv, or is there a specific drpcli cmd to do that?

wdennis
2017-05-08 23:47
n/m :wink:

greg
2017-05-08 23:48
you can do them in the same update.

greg
2017-05-08 23:48
```drpcli machines update <uuid> '{ "Profiles": [ "prof1", "prof2" ], "BootEnv": "newbootenv" }' ```

wdennis
2017-05-09 19:53
@greg If I want to update an existing profile, can I do the following? ``` drpcli profiles update - < assets/profiles/root-access.yaml ```

wdennis
2017-05-09 19:55
Tried it, throwing an error ?requires two arguments?

greg
2017-05-09 20:08
update needs object id

greg
2017-05-09 20:08
then -

wdennis
2017-05-09 20:09
Trying to change the ?access_ssh_root_mode? param in the ?root-access? profile, doing this: ``` drpcli profiles update "root-access" '{ "access_ssh_root_mode": "yes" }' ``` But no matter what I do, the value remains ?true? (instead of ?yes?, ?no?, etc.)

wdennis
2017-05-09 20:10
Ah - I see, can do: ``` drpcli profiles update "root-access" - < assets/profiles/root-access.yaml ``` and it works

wdennis
2017-05-10 02:05
@greg Trying to enable root access over SSH (yes, I know, bad?)

wdennis
2017-05-10 02:06
So in my ?root-access? profile, I have: ``` access_ssh_root_mode: "yes" ```

wdennis
2017-05-10 02:07
But when I check my resulting /etc/ssh/sshd_config, I see the following: ``` root@testnode01:~# grep -n ^PermitRoot /etc/ssh/sshd_config 28:PermitRootLogin prohibit-password 89:PermitRootLogin without-password ```

wdennis
2017-05-10 02:08
line 89 is the one put there by post-install; why isn?t it coming up with ```PermitRootLogin yes``` ??

greg
2017-05-10 02:25
Bug probably. I'll check

wdennis
2017-05-10 02:54
How can I check the rendered preseed template? A URL get?

greg
2017-05-10 03:19
yes - the template is part of a bootenv and assigned to a machine.

greg
2017-05-10 03:19
The template has a path like: '{{.Machine.Path}}/compute.ks'

greg
2017-05-10 03:20
This means: http://<ip>:8091/machines/<uuid>/compute.ks

wdennis
2017-05-10 03:39

wdennis
2017-05-10 03:43
n/m, I see that I have to set the BootEnv for the machine from ?local? to (in my case) ?ubuntu-16.04-install?

wdennis
2017-05-10 03:50
Next question: If I edit a template, do I have to do something to get DRP to pick up on the changes?

wdennis
2017-05-10 03:50
I am rendering the template, and I don?t see my changes?

greg
2017-05-10 04:00
yes - edit the template file.

greg
2017-05-10 04:00
then run: drpcli templates upload filename as filename

wdennis
2017-05-10 04:02
OK, cool, thanks

wdennis
2017-05-10 04:02
Looks good now

wdennis
2017-05-10 04:11
Hmm, looks like maybe a problem now?

wdennis
2017-05-10 04:13
So I did: ``` [dradmin@dr-admin drp]$ drpcli templates upload assets/templates/net-post-install.sh.tmpl as net-post-install.sh.tmpl ``` And now when I try to get the URL http://192.168.1.148:8091/machines/5fcbf69d-287e-4c2c-b085-5858665cd442/post-install.sh I?m seeing nothing returned?

wdennis
2017-05-10 04:14
The net_seed.tmpl upload worked OK though, when I get ?/seed I do see the correct preseed

greg
2017-05-10 04:16
usually it is net-post-install.sh

greg
2017-05-10 04:16
Template name and path are not always the same.

greg
2017-05-10 04:17
nvm - you are right.

wdennis
2017-05-10 04:17
Yeah, I have:

wdennis
2017-05-10 04:17
``` { "ID": "net-post-install.sh.tmpl", "Name": "net-post-install.sh", "Path": "{{.Machine.Path}}/post-install.sh" } ``` In the ubuntu-16.04-install bootenv

wdennis
2017-05-10 04:18
I was previously seeing something when I did a get on the ?/post-install.sh URL

greg
2017-05-10 04:18
make sure the machine's bootenv is still set to ubuntu-16.04

wdennis
2017-05-10 04:19
It is

wdennis
2017-05-10 04:19
The ?/post-install.sh URL doesn?t 404 or anything, just returns nothing?

greg
2017-05-10 04:20
okay - so that is usually a render problem.

greg
2017-05-10 04:21
You can check the bootenv errors to see if it shows something. The machine errors could as well

greg
2017-05-10 04:21
drpcli machines show <uuid> | jq .Errors

greg
2017-05-10 04:21
The log from dr-provision should show something too.

greg
2017-05-10 04:22
If you need to, you can turn on the debugRenderer preference to 2 and see if it shows anything.

greg
2017-05-10 04:22
It could have a golang template error.

wdennis
2017-05-10 04:24
No errors in relevant machine or bootenv

wdennis
2017-05-10 04:25
Does dr-provision log to filesystem somewhere, or just to console?

wdennis
2017-05-10 04:30
Now actually I see that I *am* getting a 404 on ?/post-install.sh URL

greg
2017-05-10 04:37
console

wdennis
2017-05-10 04:37
Had to restart dr-provision to get the console logs back?

wdennis
2017-05-10 04:38
And I?m seeing: ``` r-provision2017/05/10 00:22:08.919333 Rendering net-post-install.sh.tmpl for testnode01 booting ubuntu-16.04-install dr-provision2017/05/10 00:22:08.919393 Static FS: Failed to render template for /machines/5fcbf69d-287e-4c2c-b085-5858665cd442/post-install.sh: template: :54:12: executing "net-post-install.sh.tmpl" at <{{template "set-host...>: template "set-hostname.tmpl" not defined ```

greg
2017-05-10 04:38
there you go. the set-hostname.tmpl isn't loaded. it appears

greg
2017-05-10 04:38
cd assets/templates

greg
2017-05-10 04:39
drpcli templates upload set-hostname.tmpl as set-hostname.tmpl

wdennis
2017-05-10 04:39
Hmmm, didn?t change that one, & it exists? ``` -rw-r--r-- 1 dradmin dradmin 559 May 3 13:40 set-hostname.tmpl ```

greg
2017-05-10 04:40
yeah - if you upgrade without doing a bootenv install after upgrade, you are probably missing the subtemplates.

greg
2017-05-10 04:41
you can do: ``` for i in `ls *` do drpcli templates upload $i as $i done ``` in the assets/templates directory

wdennis
2017-05-10 04:43
Yup, here?s what I have: ``` [dradmin@dr-admin drp]$ drpcli templates list | grep "ID" "ID": "default-elilo.tmpl" "ID": "default-ipxe.tmpl" "ID": "default-pxelinux.tmpl" "ID": "local-elilo.tmpl" "ID": "local-ipxe.tmpl" "ID": "local-pxelinux.tmpl" "ID": "net-post-install.sh.tmpl" "ID": "net_seed.tmpl" "ID": "set-hostname.tmpl" ```

greg
2017-05-10 04:44
you are missing the root-access template

greg
2017-05-10 04:44
and others.

greg
2017-05-10 04:44
bootenv install just loads all templates in templates by default now.

greg
2017-05-10 04:44
So that little script snippet above will add them.

greg
2017-05-10 04:44
Or install a new bootenv.

wdennis
2017-05-10 04:45
OK, looks like they?re all loaded now

wdennis
2017-05-10 04:46
Aaaaaaaaand we?re good!

greg
2017-05-10 04:46
That would explain the no root access.

wdennis
2017-05-10 04:46
Yes

greg
2017-05-10 04:46
hmm - to things to do:

wdennis
2017-05-10 04:46
Time to redeploy

greg
2017-05-10 04:47
1. update the upgrade docs to remind people to reload all templates for that release. 2. See about making that renderer error show up on the machine errors list.

wdennis
2017-05-10 04:48
Thanks man

greg
2017-05-10 04:48
np

wdennis
2017-05-10 13:11
Good morning @greg :slightly_smiling_face:

wdennis
2017-05-10 13:11
Got just about everything where I want it now, except the apt sources.list?

wdennis
2017-05-10 13:11
This is what I?m getting now: ``` root@testnode01:~# grep -v ^# /etc/apt/sources.list | grep -v ^$ deb http://192.168.1.148:8091/ubuntu-16.04/install xenial main restricted deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-updates main restricted deb http://192.168.1.148:8091/ubuntu-16.04/install xenial universe deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-updates universe deb http://192.168.1.148:8091/ubuntu-16.04/install xenial multiverse deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-updates multiverse deb http://192.168.1.148:8091/ubuntu-16.04/install xenial-backports main restricted universe multiverse ```

greg
2017-05-10 13:12
We shouldn't be touching it anymore.

wdennis
2017-05-10 13:12
Looks like it is being touched?

greg
2017-05-10 13:14
Make sure you aren't setting local_repo as a parameter on anything.

wdennis
2017-05-10 13:14
n/m, just found it? ``` [dradmin@dr-admin drp]$ cat assets/profiles/local-repo.yaml Name: local-repo Params: local_repo: true ```

wdennis
2017-05-10 13:15
I did not set it to ?true??

greg
2017-05-10 13:15
Make sure that isn't assigned to a machine and it isn't set in global.

greg
2017-05-10 13:15
Loading it as a profile is okay. putting it on a machine is not.

greg
2017-05-10 13:16
Okay - so - I found the other place.

greg
2017-05-10 13:16
You may need to play with it.

wdennis
2017-05-10 13:16
It is incorporated in a profile I think?

greg
2017-05-10 13:17
Adding a profile into the system doesn't directly do anything.

wdennis
2017-05-10 13:17
Setting the value to ?false? should disable this, yes?

greg
2017-05-10 13:17
You have to add that profile to the machine's list or make that change to the global profile.

greg
2017-05-10 13:17
You can, but it won't change the problem.

greg
2017-05-10 13:18
The problem is these lines in the preseed file:

greg
2017-05-10 13:18
``` d-i mirror/protocol string {{.ParseUrl "scheme" .Env.InstallUrl}} d-i mirror/http/hostname string {{.ParseUrl "host" .Env.InstallUrl}} d-i mirror/http/directory string {{.ParseUrl "path" .Env.InstallUrl}} ```

wdennis
2017-05-10 13:19
So, DRP apt sources are default still?

greg
2017-05-10 13:20
well - kinda

wdennis
2017-05-10 13:23
So, I should comment out those preseed file lines?

greg
2017-05-10 13:24
replace them with:

greg
2017-05-10 13:24
``` d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu ```

greg
2017-05-10 13:27
``` {{if (eq "debian" .Env.OS.Family)}} d-i mirror/protocol string http d-i mirror/http/hostname string http://http.us.debian.org d-i mirror/http/directory string /debian {{else}} {{ if .ParamExists "local_repo" }} {{ if eq (.Param "local_repo") true }} d-i mirror/protocol string {{.ParseUrl "scheme" .Env.InstallUrl}} d-i mirror/http/hostname string {{.ParseUrl "host" .Env.InstallUrl}} d-i mirror/http/directory string {{.ParseUrl "path" .Env.InstallUrl}} {{else}} d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu {{end}} {{else}} d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu {{end}} ```

greg
2017-05-10 13:28
That would honor the local_repo var. I'm going to test that and commit it.

wdennis
2017-05-10 13:33
And if the ?local-repo? profile is not included in the local machine?s Profile or Profiles list, or in the global profile, then I should get the Ubuntu apt sources then?

greg
2017-05-10 13:33
yes - if you add the above piece in place.

greg
2017-05-10 13:34
I think.

greg
2017-05-10 13:34
I haven't tried it.

wdennis
2017-05-10 13:37
Well here goes nothin? :slightly_smiling_face:

wdennis
2017-05-10 13:40
Looks like a winner - from rendered seed: ``` d-i mirror/http/hostname string http://archive.ubuntu.com d-i mirror/http/directory string /ubuntu ```

wdennis
2017-05-10 13:41
I?ll next load the local-repo profile on another test box and see if I get the DRP repo

wdennis
2017-05-10 13:46
Hmm, doesn?t seem to work, get the same lines as above in the seed?

wdennis
2017-05-10 13:48
n/m, didn?t add it to the profiles list via drpcli?

greg
2017-05-10 13:49
tip as new commands addprofile removeprofile on machines object.

wdennis
2017-05-10 13:51
nice

greg
2017-05-10 13:51
At oscon today. Will be off and on

wdennis
2017-05-10 13:52
ack

wdennis
2017-05-10 13:52
OK, with the ?local-repo? profile loaded, it does work: ``` d-i mirror/protocol string http d-i mirror/http/hostname string 192.168.1.148:8091 d-i mirror/http/directory string /ubuntu-16.04/install ```

wdennis
2017-05-10 16:15
FYI, as far as enabling remote root access?

wdennis
2017-05-10 16:19
I see the post-install.sh just adds a second ?PermitRootLogin? line to sshd_config, to wit: ``` root@testnode02:~# grep -n ^PermitRoot /etc/ssh/sshd_config 28:PermitRootLogin prohibit-password 89:PermitRootLogin yes ```

wdennis
2017-05-10 16:20
As it turns out, later directives don?t overrule former ones in sshd_config

wdennis
2017-05-10 16:20
Always takes the first directive

wdennis
2017-05-10 16:22
So, I modified the ?root-remote-access.tmpl? thusly: ``` {{if .ParamExists "access_keys"}} mkdir -p /root/.ssh cat >/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN GENERATED CONTENT {{ range $key := .Param "access_keys" }} {{$key}} {{ end }} ### END GENERATED CONTENT EOFSSHACCESS {{end}} sed --in-place -re '/^PermitRootLogin/ s/prohibit-password/{{if .ParamExists "access_ssh_root_mode"}}{{.Param "access_ssh_root_mode"}}{{else}}without-password{{end}}/' /etc/ssh/sshd_config ```

wdennis
2017-05-10 16:24
Which generates when rendered: ``` mkdir -p /root/.ssh cat >/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN GENERATED CONTENT ssh-rsa [redacted] will@Wills-MacBook-Air ### END GENERATED CONTENT EOFSSHACCESS sed --in-place -re '/^PermitRootLogin/ s/prohibit-password/yes/' /etc/ssh/sshd_config ```

greg
2017-05-10 16:32
cool

greg
2017-05-10 16:33
I'll look to pull something like that in.

wdennis
2017-05-10 16:34
Did not fix ?AcceptEnv? yet ? ``` root@testnode02:~# grep -n ^AcceptEnv /etc/ssh/sshd_config 75:AcceptEnv LANG LC_* 90:AcceptEnv http_proxy https_proxy no_proxy ```

greg
2017-05-10 16:34
It should probably removed.

wdennis
2017-05-10 16:34
You need line 90 for proxy environments?

greg
2017-05-10 16:35
yes

greg
2017-05-10 16:35
well - for our DR runners

wdennis
2017-05-10 16:36
Then I guess ? http_proxy https_proxy no_proxy? should be added at the end of ?AcceptEnv LANG LC_*?

greg
2017-05-10 16:36
Yeah - not sure. Probably.

2017-05-12 20:15
I <3 drp

greg
2017-05-12 20:16
:slightly_smiling_face:

2017-05-12 20:17
Box was on CentOS 6.9 - stuck in the void.

2017-05-12 20:17
Now pxebooting and will be running centos 7.3 ASAP.

2017-05-12 20:33
any way to tell slegehammer to reboot pxe via drpcli?

2017-05-12 20:38
I used IPMI - but is there a more drpy way?

greg
2017-05-12 20:43
currently no. we are looking at a following to drp -> drpv (digitalrebar provider) that would replace drp but adds IPMI/BIOS/RAID. Something like that.

2017-05-12 20:51
Interesting to keep them separate. Microservices, even.

greg
2017-05-12 20:51
yeah - with go - they can live in the same binary, but be implemented that way.

greg
2017-05-12 20:51
So, we can separate them or not.

2017-05-12 21:16
If I just replace the greg key with my pubkey, "centos should work" eh?

2017-05-12 21:17
or do I need to set the params on the global profile?

greg
2017-05-12 21:18
yes

greg
2017-05-12 21:18
make sure the profile is added to the node or add it to the global profile.

2017-05-12 21:20
add profile to global profile?

greg
2017-05-12 21:21
add the param to the global profile.

greg
2017-05-12 21:25
I need to add some docs for that.

greg
2017-05-12 21:25
It is on my list.

2017-05-12 21:31
drpcli profiles show global { "Name": "global", "Params": { "access_keys": { "access_ssh_root_mode": "without-password", "root": "ssh-rsa <stuff> root@os1" } } }

2017-05-12 21:31
LIke so, Obi-wan?

greg
2017-05-12 21:31
access_ssh_root_mode is a peer with access_key.

greg
2017-05-12 21:31
like this:

2017-05-12 21:32
ah, oops. I see.

greg
2017-05-12 21:32
``` { "Name": "global", "Params": { "access_ssh_root_mode": "without-password", "access_keys": { "root": "ssh-rsa <stuff> root@os1" } } } ```

greg
2017-05-12 21:32
You in fact can use that an update blob on global

greg
2017-05-12 21:32
drpcli profiles update global - < file.json

greg
2017-05-12 21:33
where that snippet is in file.json

2017-05-12 21:36
got it. or call params set twice.

greg
2017-05-12 21:37
yes - one with each. :slightly_smiling_face:

2017-05-15 07:37
Hi.

2017-05-15 07:41
I have tried to get the digital rebar up and running but, something makes it eat upp all the memory I give it. I trying to use it to provision bare-metal.

2017-05-15 07:42
Any known issues?

2017-05-15 07:45
By the way it does seem like a great piece of software by looking at your demos.

rstarmer
2017-05-15 09:11
Giving provision a try, and was going to launch kolla-ansible based openstack on top, but it looks like the ubuntu images point to the control node for apt repo, and that doesn't quite work (complains about security, etc.)

rstarmer
2017-05-15 09:12
any pointers on where the apt repo ends up on disk would be useful. I'm looking to deploy this with a customer tomorrow (well, Monday), and I think I can get the basics going, but I'm not sure how to deal with this interaction?

rstarmer
2017-05-15 10:13
I think where I?m struggling is how do the bootenvs incorporate profiles? I?m not seeing where they get matchesd.

greg
2017-05-15 12:39
@rstarmer - profiles are attached to machines. They aren't matched per se. You have add them explicitly.

greg
2017-05-15 12:39
@rstarmer - I have a fix in my tree for ubuntu.

greg
2017-05-15 12:40
I missed a couple of local_repo wrappers.

greg
2017-05-15 12:42
tip will have the fix in about 10 minutes.

rstarmer
2017-05-15 15:19
is there an upgrade process? Or do I stop the service, re-install, and re-start?

greg
2017-05-15 15:19
@svallebro - For digitalrebar, you need to make sure your admin node has at least 2 cores (4 is better) and 6GB of memory.



rstarmer
2017-05-15 15:20
@greg thanks

rstarmer
2017-05-15 16:40
@greg do you have an example of mapping profiles to machines?

greg
2017-05-15 16:50
Did you get tip? The drpcli has a helper command to add profiles to machines

wdennis
2017-05-15 17:59
@greg After working with profiles/templates for a bit now, I think it would be helpful to have a ?drpcli template render [name]? command

wdennis
2017-05-15 18:01
That way you don?t have to actually apply the templates to a machine via profiles, and then call a URL to see how they would render?

wdennis
2017-05-15 18:01
Of course, there would have to be a way to spec a machine I guess in that command?

greg
2017-05-15 18:02
yes - I have two issues in the backlog for this function. :slightly_smiling_face: @wdennis

wdennis
2017-05-15 18:02
sweet

rstarmer
2017-05-16 04:23
@greg I got tip, and am just now relaunching my ubuntu instance. I'll have a look at the helper function now.

rstarmer
2017-05-16 04:37
hmm, I'm clearly doing something wrong, none of the profiles in assets are ingestible. Also, no docs on profiles?

greg
2017-05-16 05:20
should be there now.

greg
2017-05-16 05:21
tip was updated 2 hours ago with profile helpers, and docs in latest

greg
2017-05-16 05:22
oh - profiles need to be edited for your environment which is why they are not autoimported. @rstarmer

rstarmer
2017-05-16 05:27
I figured they did, and I did, but importing fails, I'll dig a bit more shortly.

rstarmer
2017-05-16 06:03
I updated root acess, and this is what I get: drpcli profiles create assets/profiles/ubuntu-access.yaml --format yaml Error: Invalid profile object: error unmarshaling JSON: json: cannot unmarshal string into Go value of type models.Profile

rstarmer
2017-05-16 06:53
@greg got it. Saw your earlier example. But why is it not possible to just pass a file as an argument rather than having to - < file.json? Certainly causes end-user confusion, I expect, after reading the docs to be able to "drpcli profile create -F yaml assets/profiles/my_profile.yaml" And all your default documents are YAML, so why is the default file format JSON? Just a few user suggestions :slightly_smiling_face:

greg
2017-05-16 12:51
```drpcli profiles create assets/profiles/ubuntu-access.yaml --format yaml```

greg
2017-05-16 12:51
should be:

greg
2017-05-16 12:51
```drpcli profiles create - < assets/profiles/ubuntu-access.yaml```

greg
2017-05-16 12:51
it will figure out json vs yaml on the redirect in.

greg
2017-05-16 12:53
oh - sorry, brain is waking up. that was a feature request. The -F or --format is for output only. Input should be either.

jj
2017-05-16 21:12
has joined #json

zehicle
2017-05-16 21:48
@jj can you point @greg to code that installs the agent?

jj
2017-05-16 21:49
sure


jj
2017-05-16 21:50
it?s just the typical chef-bootstrap

greg
2017-05-16 22:05
okay cool -

greg
2017-05-16 22:27
Should be doable, I'll see what I can do.

jj
2017-05-16 22:28
awesome.

greg
2017-05-16 22:30
sorry quick questions is proxy user and proxy pass supposed to be http://<username>

greg
2017-05-16 22:30
same with password

greg
2017-05-16 22:30
Also @jj - okay if I make it more parameterized?


jj
2017-05-16 22:39
@zehicle & @greg

jj
2017-05-16 22:39
@greg absolutely

jj
2017-05-17 00:49

greg
2017-05-17 02:27
@jj - thanks for the PRs. Give it about twenty minutes and they should be in tip images.

jj
2017-05-17 02:28
awesome!

jj
2017-05-17 02:28
i?ll be able to verify the esxi 65 images _ideally_ thursday

greg
2017-05-17 02:29
sounds fine. We can always adjust. Many of them are examples anyway. We may want to generate a validated matrix in the docs at some point. Not sure. Something to contemplate.

jj
2017-05-17 02:30
:slightly_smiling_face:

greg
2017-05-17 02:37
while you are around, @jj - I'm thinking about this as a set of parameters for chef client install. ``` Params: chef_server_url: https://mumble chef_validation_name: vname chef_validation_pem_drp_location: files/chef/validation.pem chef_validation_pem_ext_location: http??/// chef_validation_pem_string: filecontent chef_client_package_name: fred.rpm chef_client_package_drp_location: files/chef/ chef_client_package_ext_location: http://mumble/... chef_client_first_boot_run_list: - role1 - role2 chef_client_environment: env1 ```

greg
2017-05-17 02:37
For proxy, I'll already have a set of params for that. I'll add to it.

jj
2017-05-17 02:38
seem reasonable

greg
2017-05-17 02:38
I'll create defaults for ones that make sense.

greg
2017-05-17 02:39
I'm leery of the last part - about interface naming and forcing.

greg
2017-05-17 02:39
If anything, I'll add it another helper in the library of functions that can be included, but off by default.

greg
2017-05-17 02:39
Not sure how important it is.

2017-05-17 10:58
Hi, my installation stops at "TASK [gem install kvm slaves]". How can i prevent this task? I don't need any KVm management or is this tools also needs in other reason? From the documentation prospective, this tools needed in development environment but i will use digital rebar in production case.

2017-05-17 12:08
We installed rebar on ubuntu 16.x. The task "TASK [wait for admin convergence [1 upto 20 minutes]]" failed. I am not able to login to the web portal using rebar/rebar1. Is there anything we can check in the logs?

greg
2017-05-17 12:25
@theta-my: Are you behind a proxy? it tries to get things from ruby gems and that can be hard sometimes behind a proxy or if you are in some Asian countries.

greg
2017-05-17 12:27
@nratnakaram_twitter - ```cd digitalrebar/deploy/compose docker-compose ps docker-compose logs rebar_api > /tmp/rebar-api.log ```

greg
2017-05-17 12:27
The ps will should show all containers running.

greg
2017-05-17 12:27
The log should show some information about where it is.

greg
2017-05-17 12:28
If a container isn't running, we should be able to see the log for that container by replacing rebar_api with the container name in the logs command.

2017-05-17 12:29
Yes, I'm behind a stack of firewalls.

2017-05-17 12:29
No way out ;)

2017-05-17 12:30
For all installation stuff, I can only use the corporate satellite server. But for gems...

greg
2017-05-17 12:30
okay - then that is going to be a problem. We assume that admin node has a mostly unrestricted outbound access. We have started to do some work to build an install image, but it is still a work in process.

greg
2017-05-17 12:31
It will use a different deployment method. If we get it working, I may switch the install system over to it.

2017-05-17 12:31
If i know which gems needed and where are this gems searched, I can provide this manually

greg
2017-05-17 12:33
What OS?

2017-05-17 12:33
rhel7

greg
2017-05-17 12:35
in this file: deploy/tasks/base-centos.yml - lines 40-44.

greg
2017-05-17 12:35
You could try deleting them.

greg
2017-05-17 12:35
I'm not sure they are needed for your case.

2017-05-17 12:36
I will give them a try :) Thanks!

greg
2017-05-17 12:36
I suspect this is going to be a slog, but eventually, we are going to expect to get to AWS s3 to get sledgehammer.

2017-05-17 12:36
@zehicle We are not seeing any logs. Is there a specific container that should be running for authentication?

2017-05-17 12:38
We are re-running - ./wait_for_rebar.sh

2017-05-17 12:43
Sorry, the next outgoing connection is requested. "https://get.docker.com/" :(

2017-05-17 12:52
@zehicle ./wait_for_rebar.sh failed again. Authentication is not happening. These are the containers running as of now. There are no logs.

2017-05-17 12:52
[?5/?17/?2017 6:21 PM] Ashokan, Pradeep Kumar: digitalrebar/dr_rebar_api:master  digitalrebar/dr_goiardi:master    digitalrebar/dr_webproxy:master   digitalrebar/dr_dns:master        digitalrebar/cloudwrap:master     digitalrebar/logging:master       digitalrebar/dr_provisioner:maste digitalrebar/dr_rev_proxy:master  digitalrebar/dr_trust_me:master   digitalrebar/dr_postgres:master   digitalrebar/rule-engine:master   gliderlabs/consul                 digitalrebar/dr_forwarder:master 

greg
2017-05-17 12:55
@nratnakaram_twitter - This is @galthaus - the forwarder prepends zehicle. There should be logs. Those are images which look okay.

greg
2017-05-17 12:56
@nratnakaram_twitter - you don't get anything from ``` docker-compose logs -f rebar_api ``` when run in digitalrebar/deploy/compose

greg
2017-05-17 12:56
??

greg
2017-05-17 12:57
@theta-my - umm - let me check with @vlowther on where we are with the single image.

greg
2017-05-17 12:57
In either case, you are going to need docker install on the machine.

greg
2017-05-17 12:57
our install image has all containers in it without having to go to docker hub.

vlowther
2017-05-17 12:59
Basic install stuff works. Don't have anything set up to autogen the artifacts, tho.

2017-05-17 12:59
Ok, this was missing in the pre-check from depploy/rin-in... --help ;)

vlowther
2017-05-17 13:00
It has not been added to any of our ansible install methods yet.

greg
2017-05-17 13:01
Kinda like I said, we assume the admin node has internet access for the time being. We are trying to remove that requirement.

2017-05-17 13:02
I'm happy to support you to find out to go further with this ;)

greg
2017-05-17 13:03
Can you describe your networking? What is your host environment and what is outbound/inbound inet access like? What are you trying to do with DR?

2017-05-17 13:03
So, docker install was not the problem :( Script stops at "TASK [Get Docker]"

greg
2017-05-17 13:04
@theta-my: That does ``` - name: Get Docker get_url: url=https://get.docker.com/ dest=/tmp/docker.sh validate_certs=False become: yes ```

greg
2017-05-17 13:04
Which is supposed to pay attention to proxies, but may not.

greg
2017-05-17 13:04
Though that script will attempt to get to the internet as well.

2017-05-17 13:05
I have no direct internet access. I'm work on a production environment. I can only use the repos provided by a other team with respect of corporate security.

2017-05-17 13:05
@galthaus Now we got the logs. postgres seems to be in the "restarting" status

greg
2017-05-17 13:05
okay - so do you have docker in those repos?

2017-05-17 13:06
docker as installation package - YES

2017-05-17 13:06
I have run the install some seconds ago, no problem.

greg
2017-05-17 13:06
okay - so -we will need to work through this. I need to go to an awards ceremony for my daughter. I'll be back in a couple of hours.

greg
2017-05-17 13:09
@theta-my - This will not work currently. We need to build you an image that has docker containers, because the next step is to tell docker to pull from the internet a set of containers. It sounds like that is not allowed in your environment. So, we need a different plan. Can you get to the internet and put things on those boxes?

2017-05-17 13:11
Only if I use my workstation as a "proxy" (no, not as network proxy, as file proxy).

greg
2017-05-17 13:12
yeah - so, you can get files and put them in place. That oculd be workable.

2017-05-17 13:12
yup

2017-05-17 13:35
@galthaus, I am part of @nratnakaram_twitter Team. As we pointed in previous chat, that prostgres getting restarting all the time. we got he logs for the postgres container>

2017-05-17 13:35
==> Log data will now stream in as it occurs: 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: a6357a579da8 172.17.0.10 2017/05/17 13:32:15 [INFO] serf: Attempting re-join to previously known node: 9151aab4dcf9: 172.17.0.9:8301 2017/05/17 13:32:15 [INFO] agent: Joining cluster... 2017/05/17 13:32:15 [WARN] manager: No servers available 2017/05/17 13:32:15 [ERR] agent: failed to sync remote state: No known Consul servers 2017/05/17 13:32:15 [INFO] agent: (LAN) joining: [10.138.161.161] 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 7f8e39b33473 172.17.0.4 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: ebcb3d2be239 172.17.0.6 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 9151aab4dcf9 172.17.0.9 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 50fae633b067 172.17.0.8 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 34e423275179 172.17.0.7 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 861a1822a01b 172.17.0.5 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: fd4ac25c9be4 172.17.0.11 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: inll50904063h 10.138.161.161 2017/05/17 13:32:15 [INFO] serf: EventMemberJoin: 4d6c2e26f01d 172.17.0.2 2017/05/17 13:32:15 [INFO] serf: Re-joined to previously known node: 9151aab4dcf9: 172.17.0.9:8301 2017/05/17 13:32:15 [INFO] consul: adding server inll50904063h (Addr: tcp/10.138.161.161:8300) (DC: digitalrebar) 2017/05/17 13:32:15 [WARN] memberlist: Refuting a suspect message (from: a6357a579da8) 2017/05/17 13:32:15 [INFO] agent: (LAN) joined: 1 Err: <nil> 2017/05/17 13:32:15 [INFO] agent: Join completed. Synced with 1 initial agents % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 21 100 21 0 0 2753 0 --:--:-- --:--:-- --:--:-- 3000 The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.utf8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. initdb: could not access directory "/var/lib/postgresql/data": Permission denied

greg
2017-05-17 14:11
What command are you using to start dr and as what user?

vlowther
2017-05-17 14:55
I am making a Jenkins builder for the offline install bits.

greg
2017-05-17 15:13
Cool

vlowther
2017-05-17 16:24
https://s3-us-west-2.amazonaws.com/rebar-offline/master.html <-- open the file, click the link to get the latest offline install bits

zehicle
2017-05-17 17:30
Nice

rstarmer
2017-05-17 17:36
help. How do I create a user? I.e., what's the password? I'm trying with a YAML of the form:

rstarmer
2017-05-17 17:37
Name: test Params: password: test

rstarmer
2017-05-17 17:37
I also tried the password as a cryptd value

rstarmer
2017-05-17 17:37
@greg @zehicle ^^

greg
2017-05-17 17:41
looking at it now.

greg
2017-05-17 17:45
@rstarmer - well - I don't have that plumbed. I'll be working on that for a little while.

rstarmer
2017-05-17 17:46
:slightly_smiling_face:

rstarmer
2017-05-17 17:51
is there a .isos directory somewhere that things get cached? I.e., can I pre-load the cache?

greg
2017-05-17 17:51
the tftpboot directory has an isos directory like the DigitalRebar cache directory. Putting things there works.

greg
2017-05-17 17:52
The bootenvs install command will create/check an isos directory that is a peer with bootenvs and templates.

greg
2017-05-17 17:52
You can cache there as well.

rstarmer
2017-05-17 17:55
cool, will try now.

rstarmer
2017-05-17 18:09
so how do I delete a bootenv? Seems that destroy fails, and I stopped the install in the middle of trying (very slowly) to download the sledgehammer.tar

rstarmer
2017-05-17 18:09
i.e., what?s the easiest way to blow away the environement and rebuild?

rstarmer
2017-05-17 18:14
what is the hashing mechanism for the users? I found the database?.

greg
2017-05-17 18:18
golang simple-crypt

greg
2017-05-17 18:18
delete the drp-data directory and restart the dr-provision

greg
2017-05-17 18:19
sledgehammer is already in "use".

greg
2017-05-17 18:19
If you install "local", set the prefs to not ref sledgehammer, you should be able to remove sledgehammer.

rstarmer
2017-05-17 18:27
so I re-built, and using the pre-downloaded sledgehammer, I get an error for both, it says the sha doesn?t match when it installs.

greg
2017-05-17 18:27
Partial download of sledgehammer?

greg
2017-05-17 18:27
should match what is in the files.

rstarmer
2017-05-17 18:28
possible, but will re-download (the local network is just slow, so using my phone?)

2017-05-17 21:57
ok so... back to this... can we use it to spin up bare metal and XENServer VMs

2017-05-17 22:12
yes for metal. we don't have a XEN provider so you'd need a cloud wrapper like OpenStack OR create a provider that would start VMs on hosts or some other way.

2017-05-17 22:13
We had someone ask about using bash/ansible on the hosts to start VMs that would then register into rebar. there are several options that could be built

2017-05-17 22:14
of course, you could PXE the VMs but that's not what I'd recommend for normal workloads.

2017-05-17 22:14
and if i require gui access from a public ip on an interface? but provisioing via private

2017-05-17 22:14
i could terraform the vms to spin up and bootp

2017-05-17 22:15
yes, you can control the provision interface

2017-05-17 22:16
okk ill run through it again on a CentOS 7 VM

2017-05-17 22:17
I don't see a terraform XEN provider, so I'm not sure what you are thinking.

2017-05-17 22:18
there is one :) ive used it before ....

greg
2017-05-18 06:03
Hi All, DRP has a new release v3.0.4 - https://github.com/digitalrebar/provision/releases

greg
2017-05-18 06:08
Stable and tip are updated to this.

greg
2017-05-18 06:08
There are not any changes to migrate. Stop, install, start dr-provision

2017-05-18 07:43
hi rob, can i ask you about your solution with my problem? I have no chance for direct internet access to install/ start digital rebar. I can only use my workstation as a file proxy. At the moment i hang on "TASK [Get Docker]". Docker is installed.

greg
2017-05-18 13:45
@theta-my - I'm looking at. Hope to have something later this afternoon.

greg
2017-05-18 13:45
Or at least some comments.

greg
2017-05-18 13:46
@theta-my - what is your use case?

2017-05-18 13:53
we will deploy some bare metal servers, orchestrated by IBM ICO, in a customer environment.

2017-05-18 13:54
and we will use digital rebar as IaaS, full automated ;)

2017-05-18 13:57
We need the full stack : metal discovery, inventory, os install based on inventory tags, hands over to chef

2017-05-18 14:10
That's the answer your expected?

greg
2017-05-18 14:19
Thinking about it. I may have more questions later.

jj
2017-05-18 18:43
@greg ping

jj
2017-05-18 18:44
I just pulled down `tip` to install it on a provisioning machine, and it seems i?m getting:

jj
2017-05-18 18:44
``` admini@echo:/var/log$ /usr/local/bin/dr-provision -bash: /usr/local/bin/dr-provision: cannot execute binary file: Exec format error admini@echo:/var/log$ ```

greg
2017-05-18 18:49
@jj - how did you run the install?

greg
2017-05-18 18:49
What type of system?

jj
2017-05-18 18:49
it?s a xps x86

jj
2017-05-18 18:49
and i was walking through the install instructions

jj
2017-05-18 18:49
i can zoom it if you want

jj
2017-05-18 18:49
(in like 5 mins)

greg
2017-05-18 18:49
os type?

jj
2017-05-18 18:49
ubuntu

greg
2017-05-18 18:50
okay - let me know when you are ready?

jj
2017-05-18 18:51
heh, seems now?ll work. http://bit.ly/zoom-jjasghar

2017-05-19 11:43
Hello team, I hang on : "TASK [Get Docker Compose] ****************************************************** fatal: [10.241.236.92]: FAILED! => {"changed": false, "dest": "/usr/local/bin/docker-compose", "failed": true, "msg": "Request failed", "response": "An unknown error occurred: coercing to Unicode: need string or buffer, NoneType found", "state": "absent", "status_code": -1, "url": "https://github.com/docker/compose/releases/download/1.7.1/docker-compose-Linux-x86_64"} "

2017-05-19 11:57
fixed: after install docker-compose must create a link "ln -s /usr/bin/docker-compose /usr/local/bin/docker-compose"

greg
2017-05-19 12:34
did you get docker-compose from the internet and then link it?

2017-05-19 12:35
no, installed via pip

greg
2017-05-19 12:35
ok

2017-05-19 13:15
I am getting the below error, when I try installing digital-rebar, Could you please help me on the same.. TASK [Pull compose images [SLOW]] ************************************************************************************************************************************************************************* fatal: [10.138.161.217]: FAILED! => {"changed": true, "cmd": "DR_TAG=master /usr/local/bin/docker-compose pull", "delta": "0:00:05.711742", "end": "2017-05-19 18:32:50.227777", "failed": true, "rc": 1, "start": "2017-05-19 18:32:44.516035", "stderr": "Pulling postgres (digitalrebar/dr_postgres:master)...\nGet https://registry-1.docker.io/v2/: dial tcp 34.205.194.204:443: getsockopt: no route to host", "stderr_lines": ["Pulling postgres (digitalrebar/dr_postgres:master)...", "Get https://registry-1.docker.io/v2/: dial tcp 34.205.194.204:443: getsockopt: no route to host"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/home/I324148/digitalrebar/deploy/digitalrebar.retry PLAY RECAP ************************************************************************************************************************************************************************************************ 10.138.161.217 : ok=46 changed=19 unreachable=0 failed=1

greg
2017-05-19 13:29
@deepuashokan85 - do you have internet access to docker hub?

2017-05-19 13:32
@zehicle that's where I struck, system unable to contact docker hug, however git and yum repos able to talk to internet

greg
2017-05-19 13:33
Are you through a proxy?

2017-05-19 13:34
yes.. I am through proxy

greg
2017-05-19 13:34
hmm - okay - it should have setup docker to use the proxy, but maybe not. You should check that.

2017-05-19 13:35
when I do wget , getting below message: [I324148@inll50904062a digitalrebar]$ wget https://registry-1.docker.io/v2/ --2017-05-19 19:05:20-- https://registry-1.docker.io/v2/ Resolving proxy (proxy)... 172.28.64.41 Connecting to proxy (proxy)|172.28.64.41|:8080... connected. Proxy request sent, awaiting response... 401 Unauthorized Authorization failed. [I324148@inll50904062a digitalrebar]$

2017-05-19 13:36
Is there alternate way to have the docker hub local in my system?

2017-05-19 13:39
@zehicle ^^ ??

greg
2017-05-19 13:40
I'm working on it.

greg
2017-05-19 13:40
@deepuashokan85 and @theta_my are in similar problems. We are working on an offline install.

2017-05-19 13:42
Great @zehicle, let me know once you have it ready..

2017-05-19 14:03
next stop :worried: "TASK [Pull compose images [SLOW]] ********************************************** fatal: [10.241.236.92]: FAILED! => {"changed": true, "cmd": "DR_TAG=master /usr/local/bin/docker-compose pull", "delta": "0:01:31.028292", "end": "2017-05-19 16:02:13.261874", "failed": true, "rc": 1, "start": "2017-05-19 16:00:42.233582", "stderr": "Pulling postgres (digitalrebar/dr_postgres:master)...\nNetwork timed out while trying to connect to https://index.docker.io/v1/repositories/digitalrebar/dr_postgres/images. You may want to check your internet connection or if you are behind a proxy.", "stdout": "Trying to pull repository registry.access.redhat.com/digitalrebar/dr_postgres ... \nTrying to pull repository docker.io/digitalrebar/dr_postgres ... \nPulling repository docker.io/digitalrebar/dr_postgres", "stdout_lines": ["Trying to pull repository registry.access.redhat.com/digitalrebar/dr_postgres ... ", "Trying to pull repository docker.io/digitalrebar/dr_postgres ... ", "Pulling repository docker.io/digitalrebar/dr_postgres"], "warnings": []}"

2017-05-19 14:04
a wget to https://index.docker.io/v1/repositories/digitalrebar/dr_postgres/images runs with no error

greg
2017-05-19 14:07
This will fail completely.

greg
2017-05-19 14:07
You have to have an internet connection for that step.

greg
2017-05-19 14:08
I keep saying your use case is not really supported directly.

2017-05-19 14:08
yes, i have a proxy configuration which runs fine (the wget would also failed...)

2017-05-19 14:09
(after some discussion, a internet connection via proxy is allowed yet) :)

2017-05-19 14:13
@zehicle @theta-my , the problem is from office network I am unable to talk to docker hug, however from home network it works for me..

2017-05-19 14:14
I can not check this directly, https://index.docker.io/v1/repositories/digitalrebar/dr_postgres/ request a user name and password...

2017-05-19 14:28
Can some one verify if the requested files available?

greg
2017-05-19 14:37
it is there and not password protected

greg
2017-05-19 14:37
do you have password based proxy?

2017-05-19 14:38
no, the password request comes if i try to use the link in a browser

2017-05-19 14:38
if i use wget, no problem

2017-05-19 14:38
can you send my the complete file path?

2017-05-19 14:38
i will try wget to this

2017-05-19 14:39
only for check

greg
2017-05-19 14:39
it isn't a file.

greg
2017-05-19 14:39
well - it is , but it is a separate protocol that docker uses to get content.

greg
2017-05-19 14:39
My guess is that docker is misconfigured

2017-05-19 14:40
ahhh, separate protokoll...

2017-05-19 14:40
not 80 or 443 ...

2017-05-19 14:40
(http/ https)

greg
2017-05-19 14:41
it is those ports and those protos, but it turns into more requests

2017-05-19 14:43
??? than I'm lost yet, I have set a system proxy for http and https, tryed to configure the docker proxy in /etc/systemd/system/docker.service.d/http-proxy.conf

2017-05-19 14:43
but nothing helps

2017-05-19 14:43
something missing?

greg
2017-05-19 14:45
can you run this:

greg
2017-05-19 14:45
docker run hello-world

greg
2017-05-19 14:46
It should output Hello from Docker!

greg
2017-05-19 14:46
That means that you have docker configured correctly for your firewall environment.

2017-05-19 14:46
trying

2017-05-19 14:48
failed

2017-05-19 14:48
network time out

2017-05-19 15:23
ok, docker proxy rechecked, new configured, service restartd -> runs fine with "docker run hello-world"

greg
2017-05-19 15:31
retry the docker pull

2017-05-19 15:34
working on it, runs in a storage failure, sounds like not enough storage available at the chosen path...

2017-05-19 16:32
:( was not the failure

2017-05-19 16:32
TASK [Pull compose images [SLOW]] ********************************************** fatal: [10.241.236.92]: FAILED! => {"changed": true, "cmd": "DR_TAG=master /usr/local/bin/docker-compose pull", "delta": "0:01:19.428383", "end": "2017-05-19 18:30:10.865906", "failed": true, "rc": 1, "start": "2017-05-19 18:28:51.437523", "stderr": "Pulling postgres (digitalrebar/dr_postgres:master)...\nPulling rule-engine (digitalrebar/rule-engine:master)...\nPulling consul (gliderlabs/consul:latest)...\nPulling forwarder (digitalrebar/dr_forwarder:master)...\nPulling goiardi (digitalrebar/dr_goiardi:master)...\nPulling trust_me (digitalrebar/dr_trust_me:master)...\nPulling logging (digitalrebar/logging:master)...\nPulling dns (digitalrebar/dr_dns:master)...\nPulling provisioner (digitalrebar/dr_provisioner:master)...\nPulling revproxy (digitalrebar/dr_rev_proxy:master)...\nPulling cloudwrap (digitalrebar/cloudwrap:master)...\nPulling rebar_api (digitalrebar/dr_rebar_api:master)...\nfailed to register layer: devmapper: Thin Pool has 827 free data blocks which is less than minimum required 851 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior", "stdout": "Trying to pull repository registry.access.redhat.com/digitalrebar/dr_postgres ... \nTrying to pull repository docker.io/digitalrebar/dr_postgres ... \nmaster: Pulling from docker.io/digitalrebar/dr_postgres\nDigest: sha256:d94959f8c3294b3da4c8bb0ecb0e786e8cb386998b59a25e2afb50aa51a8bf2a\nTrying to pull repository registry.access.redhat.com/digitalrebar/rule-engine ... \nTrying to pull repository docker.io/digitalrebar/rule-engine ... \nmaster: Pulling from docker.io/digitalrebar/rule-engine\nDigest: sha256:2d42fdf62c74ffecdbc9d4afc2591243179bd13f3a7225cfece0758689ee2f4b\nTrying to pull repository registry.access.redhat.com/gliderlabs/consul ... \nTrying to pull repository docker.io/gliderlabs/consul ... \nlatest: Pulling from docker.io/gliderlabs/consul\nDigest: sha256:927a560389df16092364a4c26c976cd9b845800c8b96b4e687451c398f4187c1\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_forwarder ... \nTrying to pull repository docker.io/digitalrebar/dr_forwarder ... \nmaster: Pulling from docker.io/digitalrebar/dr_forwarder\nDigest: sha256:9febe88b6f8ff9b028fbeb10560cdec752d35985aaf5165629f6dd7824e0f129\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_goiardi ... \nTrying to pull repository docker.io/digitalrebar/dr_goiardi ... \nmaster: Pulling from docker.io/digitalrebar/dr_goiardi\nDigest: sha256:db620bbb4994d1074d706362996243353ad1d1f0347e7ec4e8706acb0e7195fe\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_trust_me ... \nTrying to pull repository docker.io/digitalrebar/dr_trust_me ... \nmaster: Pulling from docker.io/digitalrebar/dr_trust_me\nDigest: sha256:194f71e8edf29644ddcf9477d6764c71cff44dabe3c5de42aed779fd488183be\nTrying to pull repository registry.access.redhat.com/digitalrebar/logging ... \nTrying to pull repository docker.io/digitalrebar/logging ... \nmaster: Pulling from docker.io/digitalrebar/logging\nDigest: sha256:4eeee9ee94df703bc69ce1ef0af12d36302ac315873361c55089be701d140e9a\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_dns ... \nTrying to pull repository docker.io/digitalrebar/dr_dns ... \nmaster: Pulling from docker.io/digitalrebar/dr_dns\nDigest: sha256:46c93057b1bb01d1700d9015ef3f03a483b3204aa03ef808589086eddec8259d\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_provisioner ... \nTrying to pull repository docker.io/digitalrebar/dr_provisioner ... \nmaster: Pulling from docker.io/digitalrebar/dr_provisioner\nDigest: sha256:f790122f2d83f8f2a05b1e00d8fa90fbfbf7c093b8d4913faf3055da46b15cc6\nTrying to pull repository registry.access.redhat.com/digitalrebar/dr_rev_proxy ... \nTrying to pull repository docker.io/digitalrebar/dr_rev_proxy ... \nmaster: Pulling from docker.io/digitalrebar/dr_rev_proxy\nDigest: sha256:

2017-05-19 16:32
ffb8675f7069613ba4c6697bae49f367e7f43946b170dfa0236d2a923d42c24c\nTrying to pull repository registr

2017-05-19 16:34
I think, this step crashed : failed to register layer: devmapper: Thin Pool has 827 free data blocks which is less than minimum required 851 free data blocks. Create more free space in thin pool or...

2017-05-19 16:35
but : /dev/mapper/datavg-rebar_lv 40G 122M 38G 1% /var/lib/docker should be enough

jj
2017-05-19 16:49
any chance you have a `cURL` command to change the BootEnv handy?

jj
2017-05-19 16:49
anyone ^^

2017-05-19 16:51
curl is installed

2017-05-19 16:53
but what will you do?

jj
2017-05-19 17:06
oh, sorry i mean to run a curl command against the API

jj
2017-05-19 17:07
i useally figure one or two out and copypasta them with differences i need

jj
2017-05-19 17:07
the ?how toget the token? and all

2017-05-19 17:07
If you give me the request ;)

jj
2017-05-19 17:07
`data='{"BootEnv": "local"}'`

jj
2017-05-19 17:08
so leverage the rocketskates token then put against URL/api/v3/machines/UUID with the data as the payload?

2017-05-19 17:10
I'm not so familiar curl , so can you write down the complete command :shy:

jj
2017-05-19 17:10
heh, yeah i?ll play around with it. i think i know what i need to do after :rubberducking: you :slightly_smiling_face:

greg
2017-05-19 18:12
well - realize that DRP and DR have slightly different provisioner components at the moment.

greg
2017-05-19 18:12
@jj - drpcli machines bootenv <uuid> <bootenv>

jj
2017-05-19 18:20
Ah! Yeah the python in the end of the new boot.cfg has an error, I was going to see it was another way @greg

greg
2017-05-19 18:24
My python is function, but not very good.

greg
2017-05-19 18:39
@jj did you find the bug and fix?

greg
2017-05-19 18:58
the python thing may not be a bug.

greg
2017-05-19 18:58
It is a mismatch in supported cert validation.

jj
2017-05-19 19:01
ah, yeah it just says ?there is an error?

greg
2017-05-19 20:20
@jj - I have fix for the python thing.

jj
2017-05-19 20:20
:open_mouth:

jj
2017-05-19 20:21
I?m having a hellva time trying to get ESXi to boot correctly and install to a USB stick, i might just remove the `ks.cfg` completely at this rate

greg
2017-05-19 20:22
ok

2017-05-21 03:44
ello People ! I am working on one of my machine learning projects and need your help and support . Please fill the survey form . Thanks in advance . https://goo.gl/eHqkHk

2017-05-22 09:52
Hi, can DigitalRebar use Dell iDrac to powercycle and netboot the server ?

2017-05-22 12:19
what do I need to start simple bare-metal on prem with tftp and dhcp on admin host ?

2017-05-22 12:24
anywhere I need to put username and password for idrac admin account ?

2017-05-22 13:53
@maymann check out Provsion for DHCP/PXE/TFTP > https://github.com/digitalrebar/provision

2017-05-22 13:54
you'll need to run install the full Digital Rebar for out of band management

2017-05-22 13:56
to answer your original question - YES. that's what Digital Rebar does

2017-05-22 13:58
@punitaojha this is not the right forum for this type of survey

2017-05-22 21:09
@zehicle Hey Rob!! Remember me, from the discussion regarding DR failure on AWS

2017-05-22 21:09
I put out the word to my manager about the DR and its use cases

2017-05-22 21:10
Can you reach out to him regarding any potential use cases at Ericsson , His name is "Kumar" and he can be reached at "thalanayar.muthukumar@ericsson.com"

2017-05-22 21:19
Hi, I am trying to create a barclamp from a ansible playbook. Any pointers to docs or similar. I'm new to both ansible and rebar.

2017-05-22 21:44
@svallebro the best thing to do is look at the kubernetes install roles and the ansible jig. https://www.youtube.com/watch?v=uLTA2LA4KG8

2017-05-23 03:02
Tanks

2017-05-24 23:00
Following the mac PXE youtube video. I'm unable to get the machine to full boot. Getting the error: Failed to download stage2.img for ....

greg
2017-05-24 23:32
Is that file in the rftpboot dir for provision?

2017-05-25 12:43
zehicle: yes I see the file in drp-data/tftpboot/sledgehammer/708de8b878e3818b1c1bb598a56de968939f9d4b/stage2.img

2017-05-25 13:04
?

greg
2017-05-25 13:37
hmm - okay - so, DHCP worked, tftp worked,http didn't ....

greg
2017-05-25 13:44
connecting to the ui (or prefs through the cli), you can change the debug on render to see what files are being sent back in the output of dr-provision.

greg
2017-05-25 13:46
What networking did you configure? is it a local L2? What which local IP did you use for the dr-provision? Is the node multi-homed?

greg
2017-05-25 13:47
The questions are directed to try and make sure that the IP used to contact DRP through the full path is the same.

2017-05-25 13:47
I changed the debug level. Will look at the logs closely soon

2017-05-25 13:48
I'm on mac using virtual box. dr-provision is using the same subnet as my vb interface

2017-05-25 13:48
--static-ip=192.168.61.233/24

greg
2017-05-25 13:48
The IP used for the --static-ip flag needs to be routable by the clients

greg
2017-05-25 13:48
:slightly_smiling_face:

greg
2017-05-25 13:49
host-only network?

2017-05-25 13:49
this is the subnet I created 192.168.61.1/24

2017-05-25 13:49
yes

greg
2017-05-25 13:50
what 61.233 instead of 61.1?

greg
2017-05-25 13:50
what=why?

2017-05-25 13:50
I don't know.. I just type some random number there

2017-05-25 13:50
no reason

greg
2017-05-25 13:51
--static-ip should be the address assigned the node. On my system that is the .1 address.

2017-05-25 13:51
hmm

2017-05-25 13:51
the node that is booting?

greg
2017-05-25 13:52
vboxnet1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 ether 0a:00:27:00:00:01 inet 192.168.57.1 netmask 0xffffff00 broadcast 192.168.57.255

2017-05-25 13:52
I guess the static-ip is a little confusing for me

2017-05-25 13:52
I assume it was the dr-provision itself

greg
2017-05-25 13:52
so static-ip is the IP that dr-provision will use for communications if no other path is obvious.

2017-05-25 13:52
inet 192.168.61.1 netmask 0xffffff00 broadcast 192.168.61.255

greg
2017-05-25 13:52
Yes- use 192.168.61.1 as the static-ip.

2017-05-25 13:53
same result

2017-05-25 13:53
when I boot

2017-05-25 13:54
that's it

2017-05-25 13:54
I'm getting a login prompt now

greg
2017-05-25 13:54
Cool!

2017-05-25 13:54
odd.. I thought I tried .1 before

2017-05-25 13:54
anyhow.. it works :)

2017-05-25 13:54
what's the login?

greg
2017-05-25 13:54
root/rebar1

greg
2017-05-25 13:54
drpcli machines list

greg
2017-05-25 13:54
should show the node as well

2017-05-25 13:55
nice

2017-05-25 13:55
I will play around some more.. those videos are helpful

greg
2017-05-25 13:56
I'll take a note to review the docs and maybe add note to the video abou that.

greg
2017-05-25 13:56
Which video did you use?

greg
2017-05-25 13:57
This is actually @galthaus or @greg.

2017-05-25 13:57
I watched the video with the old macbook pro

2017-05-25 13:58
there was one thing didn't work for me during install, but I just ignore it

2017-05-25 13:58
sudo route -n add -net 255.255.255.255 --static-ip=10.0.0.20

2017-05-25 13:58
it said to do that if I'm osx > 10.9

2017-05-25 13:58
but it didn't like the syntax "--static"

greg
2017-05-25 13:58
the --static-ip should be 192.168.61.1 in your case.

greg
2017-05-25 13:59
Yeah - it was an add from the community that was needed for his mac. It may not always be needed.

greg
2017-05-25 13:59
You sometimes need it to make sure the broadcast packets are routed correctly on a mac.

2017-05-25 14:00
it was complaining about syntax error on the world "static"

2017-05-25 14:00
need to step out..bbl

greg
2017-05-25 14:01
actually, I think greater than/less thans are reversed.

greg
2017-05-25 14:01
I'll fix that.

greg
2017-05-25 14:01
thanks.

2017-05-25 15:00
I believe I also had an issue with the first one too

2017-05-25 15:00
I'm running 10.11.6

2017-05-25 15:10
zehicle: I noticed there's a more comprehensive UI than the swagger one in other videos. How do I access that?

greg
2017-05-25 15:43
Path is /ui

2017-05-25 15:46
that's the one I have been using: <localhost:8092/ui>

2017-05-25 15:48
I'm referring to the other one. Like in k8s video

2017-05-25 15:57
or is that part of the RackN product?

greg
2017-05-25 16:10
That is the bigger digitalrebar ux. Most likely

2017-05-25 16:12
zehicle: how do I access the "bigger DR ux"

greg
2017-05-25 16:27
It is a separate product You would have to install digitalrebar fulll. The question is what are trying to do and which features do you need

2017-05-25 16:38
Trying to do an eval of the product and better understanding what it can do.

greg
2017-05-25 16:48
okay - well - check here for digitalrebar info: https://github.com/digitalrebar/digitalrebar

2017-05-25 16:50
Got it thx

2017-05-25 17:28
6GB Ram min now :)

greg
2017-05-25 17:36
for digitalrebar, it is doing a little more than drp. :slightly_smiling_face:

2017-05-25 17:38
just a little

2017-05-25 17:38
:)

2017-05-25 17:48
zehicle: What's a good email for you? It might make sense for us to have a brief chat before I go down the full blown evaluation. I can email you a short description of what we trying to accomplish.

greg
2017-05-25 17:53
and

2017-05-25 17:56
Thx, you will receive an email from Aaron soon

greg
2017-05-25 17:59
:slightly_smiling_face: thanks - we are at Gluecon right now. So reply may not be today.

2017-05-25 18:21
np

2017-05-26 12:03
I installed the Digital-rebar, the installation completed successful. However in the UI provisioner tab is missing. But the provisioner container is running.

2017-05-26 12:03
[root@mo-dc2df9e2a compose]# docker-compose ps WARNING: The DR_TAG variable is not set. Defaulting to a blank string. Name Command State Ports ------------------------------------------------------------------------------------------------------------- compose_cloudwrap_1 /sbin/docker-entrypoint.sh Up compose_consul_1 /bin/consul agent -config- ... Up compose_dns_1 /sbin/docker-entrypoint.sh Up compose_forwarder_1 /sbin/docker-entrypoint.sh Up 0.0.0.0:3000->3000/tcp, 0.0.0.0:443->443/tcp compose_goiardi_1 /sbin/docker-entrypoint.sh Up compose_logging_1 /sbin/docker-entrypoint.sh Up compose_postgres_1 /docker-entrypoint.sh postgres Up compose_provisioner_1 /sbin/docker-entrypoint.sh Up compose_rebar_api_1 /sbin/docker-entrypoint.sh Up compose_revproxy_1 /sbin/docker-entrypoint.sh Up compose_rule-engine_1 /sbin/docker-entrypoint.sh Up compose_trust_me_1 /sbin/docker-entrypoint.sh Up compose_webproxy_1 /sbin/docker-entrypoint.sh Up [root@mo-dc2df9e2a compose]#

2017-05-26 12:04
Please check and let me know where I am going wrong...

greg
2017-05-26 13:26
First, my guess is that you don't want forwarder mode. You should rerun the command with: --access=HOST

greg
2017-05-26 13:33
second, you can go to https://adminip/health and see if the service has registered.

2017-05-26 18:50
@zehicle , even after rerun the command with --access=host , still provisioner service is not showing in the UI

2017-05-26 18:50
{"Map":{"dns-mgmt-service":["172.17.0.7:6754"],"rebar-api-service":["172.17.0.9:3000"],"rule-engine-service":["172.17.0.2:19202"]},"Matcher":{"dns-mgmt-service":"^dns/(._)","rebar-api-service":"^rebar-api/(._)","rule-engine-service":"^rule-engine/(api/.*)"},"Default":"rebar-api-service"}

greg
2017-05-26 18:58
capital HOST

greg
2017-05-26 18:59
do you have access to internet?

greg
2017-05-26 18:59
you can do: ```docker-compose logs -f provisioner```

greg
2017-05-26 18:59
It would be nice to know what is in that log.

2017-05-30 09:38
@zehicle , I am getting below messages from the command : docker-compose logs -f provisioner [root@mo-dc2df9e2a compose]# docker-compose logs -f provisioner WARNING: The DR_TAG variable is not set. Defaulting to a blank string. Attaching to compose_provisioner_1 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Waiting for 192.168.124.11/24 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 100 19 100 19 0 0 3472 0 --:--:-- --:--:-- --:--:-- 3800 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 100 19 100 19 0 0 3314 0 --:--:-- --:--:-- --:--:-- 3800 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 100 19 100 19 0 0 1100 0 --:--:-- --:--:-- --:--:-- 1117 provisioner_1 | Calling cmd: /usr/local/entrypoint.d/15-get-sledgehammer.sh provisioner_1 | % Total % Received % Xferd Average Speed Time Time Time Current provisioner_1 | Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:02:06 --:--:-- 0curl: (7) Failed to connect to opencrowbar.s3-website-us-east-1.amazonaws.com port 80: Connection timed out provisioner_1 | Calling cmd: /usr/local/entrypoint.d/00-wait-for-ip.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/05-start-samba.sh provisioner_1 | Calling cmd: /usr/local/entrypoint.d/10-wait-for-consul.sh

greg
2017-05-30 13:33
Does your admin node have access to amazon s3?

josh
2017-05-30 18:28
has joined #json

2017-05-30 18:50
@zehicle , No admin node do not have access to amazon s3. [root@mo-dc2df9e2a compose]# telnet opencrowbar.s3-website-us-east-1.amazonaws.com 80 Trying 52.216.82.18...

greg
2017-05-30 18:55
well - that is the problem.

greg
2017-05-30 18:55
currently install requires that the admin node has a path out to the internet.

greg
2017-05-30 18:55
for initial setup

2017-05-31 12:50
@zehicle ,

2017-05-31 12:53
@zehicle , Since we are having internet connection issue at company, I am thinking like this, will deploy the digital-rebar on VM host from home internet, then will take that OS image and will deploy on our company server, will that work?

greg
2017-05-31 13:18
You may have to restart it.

greg
2017-05-31 13:18
it should work.

2017-05-31 15:31
How about the Admin IP, It will get change when I move the home based OS image to Company network, how it will work?

2017-05-31 15:35
@zehicle , when I logged into provisioner container, and issued ps -ef command to check the running process.. I see the below. root@59a1537e7152:/# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 0 12:48 ? 00:00:00 /bin/bash /sbin/docker-entrypoint.sh root 22 1 0 12:48 ? 00:00:00 smbd root 25 22 0 12:48 ? 00:00:00 smbd root 28 22 0 12:48 ? 00:00:00 smbd root 43 1 0 12:48 ? 00:00:00 /bin/bash /sbin/docker-entrypoint.sh root 44 43 0 12:48 ? 00:00:00 curl -fgL -o /tftpboot/sledgehammer/a42c8c66a60b77ca1c769b8dc7e712f6644579ed/sha1sums http://opencrowbar.s3-website-us-east-1.amazonaws.com/sledgehammer/a42 root 46 0 0 12:48 ? 00:00:00 /bin/bash root 60 46 0 12:49 ? 00:00:00 ps -ef the PID 44 is running curl which connects to AWS S3. which required internet. How will it be possible if I import the VM to company network, which will not allow the container to connect ti internet?

greg
2017-05-31 15:56
The images are cached in the image.

greg
2017-05-31 15:57
You can stop the containers and restart them with a specific command line that will change the external ip and not run the ansible playbooks.

greg
2017-05-31 15:57
Or you can contact RackN and get support for the offline install mode.

2017-06-01 07:52
hi guys, i have tried to install some times, but no networks coming up or adjustable in the web interface. each time i have tried to add one, i can not configure this to fits to my needs. I need following design:

2017-06-01 07:54
!(/home/de1235092/Documents/Projekts/Talanx/rebar/system_layout_V01.png)

2017-06-01 07:54
(how can i insert a picture?)


2017-06-01 07:56
which asses mode i should use? host or forwarder?

greg
2017-06-01 12:46
HOST - note the capitalize the word on the command line. Create a network with category admin and group lab for the deploy network. set conduit to dhcp. For BMC, create a network with category bmc and group lab for the bmc network. set conduit to bmc.

greg
2017-06-01 12:47
DigitalRebar runs on an Admin node. The admin node can have any amount of networks it needs, but there are two requirements. That the admin node has routable access to the BMC network and that nodes and the admin node can have bidirectional communication on a admin network. [11:06] The Admin node will be required to be in the L2 network for all DHCP networks unless DHCP relays/BOOTP forwarders can be configured to relay on its behalf. [11:07] The DigitalRebar admin node will only serve DHCP from configured networks for networks in the bmc and admin categories (I believe). You can through the cli/api force the DHCP server to do more, but it will not necessarily be reflected on the node in DigialRebar. [11:09] DigitalRebar Networks are defined by a category and a group. The group is an arbitrary string that allows networks of different categories to be applied to nodes in the same group. [11:09] The category defines what the network is used for. This is a free form field that has two special values. [11:10] The first is admin - networks in this category are assumed to be the PXE booting networks for the bare metal system. These networks when containing a network range "dhcp" will server those addresses to unregistered nodes [11:11] The DigitalRebar will assign an IP from the host range once the node is discovered and treat that address as a more static address (though it can still be delivered by DHCP). [11:12] The other reserved category is Bmc. This is used by the IPMI roles / management system to direct IPMI configuration. It too should have a host range that will assign addresses from the pool. It can also use DHCP to deliver those bound addresses. [11:15] Other categories are for use of the administrator. These can be applied to nodes for configuration post installation (or pre, but most useful post) and will assign addresses from the host range to nodes. The group is used to figure out which sets of categories go together. For example, if DigitalRebar discovers a node on network admin-rack1 and it is told to configure ipmi, it will look for a bmc-rack1 network to draw its IPMI address from. Falling back to the first bmc network, I think. or none. [11:15] Also, the system attempts to last octet alignment across networks if possible. [11:15] What interface a network is configured on is determined by the conduit. [11:16] The conduit is only applicable to managed nodes (not the admin node). [11:17] For non-bmc networks, the conduit can be dhcp, or [-+?][1,10,100,40][gm][0-100] (e.g. 10g1 - which means second 10g found on box). [11:18] dhcp means that what ever the box dhcp on is the interface used. This is really only useful for the admin network. [11:18] For bmc networks, the conduit can be dhcp or bmc. DHCP says use DHCP to get the address for the bmc, other set the bmc statically. [11:19] conduits can be multiple interfaces separated by a comma. This will generate a bonded set [11:20] Vlans can also be added on top. [11:20] In general, admin and bmc don't use conduits or vlans. [11:21] When an admin or dhcp network is configured for DHCP correctly, you should be able to see in the DHCP nav section of the UX that configuration and current leases. You may have to hard refresh the UX page to get the latest data. [11:21] Networks also define routers. The pref of the router determines if it becomes the default gateway. Lower is high priority. [11:22] Sooo - when the network is configured the available interfaces are consulted and the lowest preference is used to set the default gateway.

greg
2017-06-01 12:47
Dump from another slack conversation.

zehicle
2017-06-02 17:38
@meshiest the link to the gitter chat on the rebar.digital site is wrong -> points to digitalrebar/digitalrebar but should be digitalrebar/core

meshiest
2017-06-02 17:38
has joined #json

2017-06-02 17:48
Howdy! I'm using the rebar-provision quickstart on osx 10.12.5, and after tools/discovery-load.sh runs, and a subnet is assigned, boxes are able to pull DHCP, but fail on pulling the next file in sequence: The traffic that comes via tcpdump is: 38 RRQ "lpxelinux.0^A^DM-^?M-^?M-|" octet blksize 1456"

2017-06-02 17:48
the failed message on the server console is the same

2017-06-02 17:48
*for the same file

2017-06-02 17:49
Any thoughts on where to look to see what's going on?

vlowther
2017-06-02 17:54
hm

vlowther
2017-06-02 17:55
I don't think the extra control characters should be part of the RRQ request.

vlowther
2017-06-02 17:55
What are the boxes?

zehicle
2017-06-02 17:55
thanks @meshiest

2017-06-02 17:56
The boxes are a bit older, based on https://www.asus.com/Commercial-Servers-Workstations/KFSN5DIST/specifications/

2017-06-02 17:56
provision is running locally on my mac, late 2012, retina, 10.12.5

2017-06-02 17:56
I added the route to point directly at said mac goo

2017-06-02 17:56
*too

vlowther
2017-06-02 17:57
hm

vlowther
2017-06-02 17:58
Is the firewall on the mac set to allow incoming TFTP to dr-provision?

2017-06-02 17:58
firewall is off

2017-06-02 17:59
tcpdump shows the packets getting there

vlowther
2017-06-02 18:02
ok

vlowther
2017-06-02 18:03
and lpxelinux.0 is present in drp-data/tftpboot ?

vlowther
2017-06-02 18:03
(it should have been automatically when the provisioner started up

2017-06-02 18:03
yes

2017-06-02 18:04
https://gist.github.com/bunchc/c5909f3289481c0cd9cf6002167c11f2

vlowther
2017-06-02 18:06
ok, that looks fine.

vlowther
2017-06-02 18:07
Have you created a network in dr-provisioner via the UI?

2017-06-02 18:07
Yup: https://i.imgur.com/MIUxBlQ.png

vlowther
2017-06-02 18:08
ok

vlowther
2017-06-02 18:08
What interface is en3?

2017-06-02 18:09
Thunderbolt ethernet

2017-06-02 18:09
hooked up to the same switch as the box pxe booting

vlowther
2017-06-02 18:11
ok

vlowther
2017-06-02 18:11
I am seeing if I can duplicate here.

vlowther
2017-06-02 18:12
Don't have a Mac handy to test with, but I do have a beefy Linux box.

2017-06-02 18:13
I can spawn an ubuntu VM if we need, was hoping to avoid that layer tho.

2017-06-02 18:19
Victor: I need to step away for a few hours. Will be back tho.

vlowther
2017-06-02 18:22
hm

vlowther
2017-06-02 18:22
I get 13:21:00.186474 IP (tos 0x0, ttl 64, id 1045, offset 0, flags [none], proto UDP (17), length 69) 192.168.124.41.13686 > m4723.tftp: [udp sum ok] 41 RRQ "lpxelinux.0" octet blksize 1432 tsize 0

vlowther
2017-06-02 18:23
that is with tcpdump -vvv

greg
2017-06-02 18:45
@bunchc. Make sure the subnet option with pxelinux.0 doesn't have extra content in it

2017-06-02 21:33
Greg: re: subnet option, the bit in the UI?

2017-06-02 21:42
Looks like it's getting the bits from somewhere, started over from the top and am now getting two different sets of control chars

2017-06-02 21:42
lpxelinux.0M-^? in addition to what it have before

2017-06-02 21:47
The latest with tcpdump -vvv https://gist.github.com/bunchc/4d9f65cc698bda950f8ae57ebe635c00 looks like it both sends lpxelinux.0 and the one with control charachters

2017-06-02 22:12
Looks like it's on the NIC end, am able to boot other boxes mostly fine

2017-06-02 22:13
Thanks!

zehicle
2017-06-02 23:46
cool - I'd be interested to know what made the NIC do that

2017-06-03 03:08
I'm not sure /why/ it's sending control charachters, other than it looks to be part of an older tftp spec. I'll dit a little bit once I get it sorted out.

2017-06-05 10:06
Hello, I've been using cobbler to set up bare metal nodes, and discovered DR as a potential alternative. I am looking through the table of contents in the docs and not finding much reference to bare metal installs. Is there a handy link?

2017-06-05 10:36
I manager to get it working after watching.

2017-06-05 10:37
https://youtu.be/LhqbfcCOgwY

2017-06-05 10:38
This was also helpful

2017-06-05 10:38
https://youtu.be/5YWMlYYuu-s

2017-06-05 13:57
Thank you, Simon, I'll take a look!

2017-06-05 20:04
@svallebro those videos will bring in the full Rebar infrastructure which does a lot beyond cobbler. If you want to start simpler with just replacing cobbler, check out the Provision subproject too: https://github.com/digitalrebar/provision

2017-06-05 20:04
sorry should have flagged @eegilbert instead of @svallebro !

2017-06-06 09:00
Thanks, Rob.

2017-06-07 05:01
I created the support ticket, https://rackn.freshdesk.com/support/tickets/49 . I a awaiting for response from the support. could you please help on the same...

2017-06-07 10:35
Hi all, I`m trying to install Digital rebar on clean Ubuntu 16.04 installation (bare metal) and I`m stuck at "wait for admin convergence". I used "curl -fsSL https://raw.githubusercontent.com/digitalrebar/digitalrebar/master/deploy/quickstart.sh | bash". What am I missing ?

greg
2017-06-07 13:46
@zdebinski - how big is the machine you running the admin node on?

greg
2017-06-07 13:46
You need at least 6GB of memory and ideally 4+ virtual CPUs.

2017-06-07 13:48
It should be enough I have 32GB of ram and eight cores

2017-06-07 13:51
I removed apparmor and managed to launch ui but install process failed timing out and ui was not running properly

2017-06-07 13:52
It`a also failing when dnsmasq-base is installed

greg
2017-06-07 13:59
Digitalrebar runs things that conflict with dnsmasq and ntp. The install ansible should kill all of those.

2017-06-07 14:02
I know I saw it tries to stop the services but still installation is failing if I`dont remove it

2017-06-07 14:27
@deepuashokan85 have you registered for RackN support? I'm not showing you in our system.

2017-06-07 14:40
@zehicle_twitter "cmd": "killall dnsmasq", "failed": true, "msg": "[Errno 2] No such file or directory", "rc": 2}" it should be killall5 not killall

2017-06-07 14:41
that is why it was failing

greg
2017-06-07 14:41
oh - ok - hmm - I bet that varies by OS type. :neutral_face:

2017-06-07 14:45
Ubuntu 16.04 - killall5 belongs to sysvinit-utils

2017-06-07 15:22
I`ve managed to finish installation, but adding deployments or adding nodes doesn`t work.

2017-06-07 15:22
evproxy_1 | 2017/06/07 15:20:13 requested url = /api/v2/deployments revproxy_1 | 2017/06/07 15:20:13 translated url = https://rebar-api-service/api/v2/deployments rebar_api_1 | 2017-06-07 15:20:13.514 [20207] [ERROR] EXCEPTION: param is missing or the value is empty: name rebar_api_1 | 2017-06-07 15:20:13.514 [20207] [ERROR] BACKTRACE: rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/action_controller/metal/strong_parameters.rb:251:in `require' rebar_api_1 | /opt/digitalrebar/core/rails/app/controllers/deployments_controller.rb:65:in `block in create' rebar_api_1 | /opt/digitalrebar/core/rails/lib/api_helper.rb:118:in `block in retriable_transaction' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `block in transaction' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/transaction.rb:184:in `within_new_transaction' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activerecord-4.2.5/lib/active_record/connection_adapters/abstract/database_statements.rb:213:in `transaction' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activerecord-4.2.5/lib/active_record/transactions.rb:220:in `transaction' rebar_api_1 | /opt/digitalrebar/core/rails/lib/api_helper.rb:116:in `retriable_transaction' rebar_api_1 | /opt/digitalrebar/core/rails/app/controllers/deployments_controller.rb:50:in `create' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/action_controller/metal/implicit_render.rb:4:in `send_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/abstract_controller/base.rb:198:in `process_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/action_controller/metal/rendering.rb:10:in `process_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/abstract_controller/callbacks.rb:20:in `block in process_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:117:in `call' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:117:in `call' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:555:in `block (2 levels) in compile' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:505:in `call' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:505:in `call' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:92:in `__run_callbacks__' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:778:in `_run_process_action_callbacks' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/callbacks.rb:81:in `run_callbacks' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/abstract_controller/callbacks.rb:19:in `process_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/action_controller/metal/rescue.rb:29:in `process_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/action_controller/metal/instrumentation.rb:32:in `block in process_action' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/notifications.rb:164:in `block in instrument' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/notifications/instrumenter.rb:20:in `instrument' rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/activesupport-4.2.5/lib/active_support/notifications.rb:164:in `instrument'

2017-06-07 15:22
rebar_api_1 | /var/cache/rebar/gems/ruby/2.1.0/gems/actionpack-4.2.5/lib/action_controller/metal/instrumentation.rb:30:in `process_action' reba

2017-06-07 15:24
I dont have time to debug it further, I will switch to kargo

greg
2017-06-07 15:32
digitalrebar uses kargo and drives it. You need to add a provider if you are using it against cloud. If you are using metal, you have to define networks and OS install images. Thanks for trying it.

zehicle
2017-06-07 15:34
Your error is that name was not defined in the call

2017-06-07 15:36
I know :) but I provided it :worried:

greg
2017-06-07 15:36
Did you create a provider?

2017-06-07 15:37
I need to deploy kubernetes on metal

greg
2017-06-07 15:37
Did you have nodes discovered by the system?

2017-06-07 15:37
When I want to choose provider metal does not show

2017-06-07 15:38
I will back to it tommorow, thanks for the help :)

greg
2017-06-07 15:38
Yes - installing on baremetal means that you would select use existing nodes that have been discovered by DigitalRebar. These systems would need to PXE boot our discovery image and get added to DigitalRebar to be then consumed by the deployment wizard for kubernetes.

aaron.feng
2017-06-07 17:13
has joined #json

dubie
2017-06-07 17:45
has joined #json

ajones
2017-06-07 17:48
has joined #json

pdelarosa-riot
2017-06-07 17:48
has joined #json

eblack
2017-06-07 18:01
has joined #json

mpainter
2017-06-07 18:04
has joined #json

rcameron
2017-06-07 20:18
has joined #json

armagan
2017-06-07 21:36
has joined #json

2017-06-08 07:13
@zehicle What if I have dedicated servers that can`t be booted with PXE, that have system already installed and have only public IPs ? is it possible to use digital rebar in this case ?

greg
2017-06-08 13:31
Yes - you would "join" them to DR. There are join scripts in the digitalrebar/deploy directory that can be used as a starting point for that process.

rstarmer
2017-06-08 18:24
hey, how do I set DRP to use a local mirror (or rather the install ISO) to provision a node (using ubuntu-16.04, pre-downloaded, and that all seems to work as far as getting the system booted). I.e., I?m working in a somewhat disconnected environment, and am not sure how to proceed if I need to have a local mirror rather than just using the upstream ones

rstarmer
2017-06-08 18:25
we can use a proxy, but the upstream connection from the lab is _SLOW_ and so we?d rater use/build a local one if that?s the best approach.

rstarmer
2017-06-08 18:26
also, how do I add a profile (for ssh keys, for example) to a bootenv? It seems like the template is there (e.g. ubutntu-16.04 looks like it would use access_keys parameters)

jgartrel
2017-06-08 19:20
has joined #json

2017-06-08 19:38
there is a way to do it, I know we discussed it - I could not find the docs so I'm looking around

zehicle
2017-06-08 19:39
@vlowther do you remember how to set the repos for DRP to a local source?


vlowther
2017-06-08 19:51
set the local_repo param to true.

vlowther
2017-06-08 19:52
that will use just the repos off the expanded Ubuntu ISO.

vlowther
2017-06-08 19:54
If you want other repos from somewhere other than the exploded ISO, I don't think wer have a pre-baked template that does that.

rstarmer
2017-06-08 20:07
thanks guys, will have a look, and come back with questions.

rstarmer
2017-06-08 20:08
Ah, yes, ok so that template is a part of my ubuntu bootenv, but how do I set the parameter to be true?

rstarmer
2017-06-08 20:09
do I update the bootenv with a JSON/yaml blob?

rstarmer
2017-06-08 20:14
I do need help here, I don?t have a good model for how to pass parameters, for example, do I just update the bootenv with the parameters I?d like added?

rstarmer
2017-06-08 20:19
So with that template, if I want to add it to my ubuntu template, I can create a new template section, but it?s not clear where it ?lives?. I.e., what?s the path?

rstarmer
2017-06-08 20:24
ok, I see where those templates live and how they are ingested into files, but it isn?t clear how I set the local_repo parameter for that bootenv. I?ve tried a YAML object like ```Params: local_repo: true```

rstarmer
2017-06-08 20:25
but that gets rejected when I do: ``` drpcli bootenvs update bootenvs/ubuntu-16.04.yml local_repo.yaml```

rstarmer
2017-06-08 20:25
local_repo.yaml is where I put the Params: bit

vlowther
2017-06-08 20:28
yeah, that parameter has to be set on the machine, not the bootenv.

rstarmer
2017-06-08 20:29
do I set it directly on the machine? and how might I make that a default (possible?)

rstarmer
2017-06-08 20:32
I think I got it: ``` drpcli machines set "{uuid}" params "local_repo" to "true"```

rstarmer
2017-06-08 20:49
shouldn?t ``` drpcli machines params "{uuid}" params < params.yml``` work?

greg
2017-06-08 20:51
the first one

rstarmer
2017-06-08 20:53
but how do I pass a bunch of parameters?

rstarmer
2017-06-08 20:53
like access keys

rstarmer
2017-06-08 20:54
do I have to write out a text json blob rather than nice simply yaml?

greg
2017-06-08 20:56
I often do this: ```drpcli machines set j param rebar-access to "$(cat fred)"```

rstarmer
2017-06-08 20:56
yeah, but I wanted to set local_repo, and access_keys, and,?.

rstarmer
2017-06-08 20:57
well, it turns out params does happily read JSON, just not yaml (at least as text via the CLI)

rstarmer
2017-06-08 20:57
So, the second question is: how do I set that as a default on the bootenv, I don?t want ot have to do this for every machine after the fact :slightly_smiling_face:

greg
2017-06-08 20:58
global profile

greg
2017-06-08 20:58
drpcli profiles show global

greg
2017-06-08 20:58
Path the params there and all nodes gett them.

rstarmer
2017-06-08 21:00
I?ll give that a shot.

rstarmer
2017-06-08 21:01
kewl

rstarmer
2017-06-09 20:01
what?s the best way to change the hostname/ip addresses of discovered servers in DRP? Do I have to manually register the nodes, or can I update things somehow (my experience seemed to indicate that nodes didn?t like this to be changed and ended up out of sync with the system)

greg
2017-06-09 20:04
You can change the name on the machine and reboot into sledgehammer. The name is what is used as FQDN.

rstarmer
2017-06-09 20:04
will try. might be that I erased the domain when I tried last time

rstarmer
2017-06-09 20:04
how about IP address?

greg
2017-06-09 20:05
IP address comes from lease to mac to machine.

greg
2017-06-09 20:05
You can reserve an address by creating a resesrvation of mac to IP.

rstarmer
2017-06-09 20:06
how do I do that?

rstarmer
2017-06-09 20:06
drpcli?

greg
2017-06-09 20:06
yes

rstarmer
2017-06-09 20:07
can I just change the lease and reboot the node?

greg
2017-06-09 20:07
to get a different IP?

rstarmer
2017-06-09 20:08
yeah

greg
2017-06-09 20:08
Can try it. Not sure if that will work, haven't tried it.

rstarmer
2017-06-09 20:08
customer just said he didn?t care, but I like order (sometimes :)

rstarmer
2017-06-09 20:08
ok, may be moot, but I may try anyway.

rstarmer
2017-06-09 20:14
and ipmi control is not part of DRP right?

greg
2017-06-09 20:15
correct - it is not

rstarmer
2017-06-09 20:17
thn

rstarmer
2017-06-09 20:17
x

rstarmer
2017-06-12 17:30
default user for sledgehammer? Seems it should be root, but RocketSkates doesn't seem to work (we were able to add an ssh key, but wanted to get proper terminal access defined as well)?

2017-06-12 18:30
https://www.youtube.com/watch?v=uUWU-4ObGIY I was following this demo, came to the end with the same results BUT the machine was not discovered. The last line on the bootme machine on the video is: Connecting to 192.168.124.11:8091 I get the same, but then... wget: can't connect to remote host (192.168.124.11): Operation timed out Failed to download stage2.img for 70.... Which actually makes sense as the bootme vm is not on that subnet... so I'm confused... did the demo fail to discover the machine too, (seems like it would have)... or I'm confused... I dug around abit and figured out that 192.168.124.11 is the default admin node... but I was doing just the provisioner

zehicle
2017-06-12 18:31
@rstarmer I believe it's rebar

rstarmer
2017-06-12 18:36
@zehicle thanks

rstarmer
2017-06-12 18:36
will give that a shot.

zehicle
2017-06-12 18:41
@ctrees, the demo was missing a step that is added in the docs


zehicle
2017-06-12 18:41
On Darwin, you may have to add a route for broadcast addresses to work. This can be done with the following comand. The 192.168.100.1 is the IP address of the interface that you want to send messages through. The install script will make suggestions for you. sudo route add 255.255.255.255 192.168.100.1

2017-06-12 18:42
Thanks I did the route, (as it was hit in demo ;) I'll go check the docs... thanks1

2017-06-12 18:49
@ctrees the demo ends at the same IP request... checking on my end

2017-06-12 18:50
yea I for sure am not tracking... I checked https://127.0.0.1:8091 and it was not live...

2017-06-12 18:51
that should be the tftp endpoint ?correct?

2017-06-12 18:52
try w/o http

2017-06-12 18:52
that's for http static files, would be the same as the tftp root

2017-06-12 18:52
yup... (duh) that worked...

2017-06-12 18:55
the broadcast works but it's the 192.168.124.11 how and why is that even in there... I started to track that down which is why I came here... seems like it should be 127.0.0.1 OR 192.168.56.1 (or whatever was the vmlan network) ?

2017-06-12 18:56
I think that's the sledgehammer default - should be overridden from DHCP settings, I'm looking for the doc

vlowther
2017-06-12 18:58
This is DRP, right?

2017-06-12 18:58
yes

2017-06-12 18:58
yes

vlowther
2017-06-12 18:58
ok

vlowther
2017-06-12 18:59
What command line did you use to launch drp?

2017-06-12 18:59
sudo ./dr-provision --file-root=`pwd`/drp-data/tftpboot --data-root=drp-data/digitalrebar

2017-06-12 19:00
no background is all...

vlowther
2017-06-12 19:00
ok, that is fine.

vlowther
2017-06-12 19:01
Did you add a network to drp via the web UI?


vlowther
2017-06-12 19:03
hm

vlowther
2017-06-12 19:04
so sledgehammer should have been making requests back to 192.168.66.1


2017-06-12 19:04
yea... my thinking too

vlowther
2017-06-12 19:05
assuming that is the actual IP address on vboxnet1

2017-06-12 19:05
but then the same IP came up in the video BUT... it did not wait till the machine got discovered

2017-06-12 19:05
yea... see it in the route table...

2017-06-12 19:06
I kind of assumed that the 124.11 was the sledge default... so I started to hunt where you override that... that's where I got lost in scripts... I was looking for the startup.sh thing ...

2017-06-12 19:08
I went back to the video... figure I messed a config (just like the route)... BTW I love when 'common' mistakes are made and you show how to debug.... of course they make for longer video... but they sure help me more than 'doing it right the first time'... ;-)

vlowther
2017-06-12 19:09
Are you in Sledgehammer right now?

2017-06-12 19:10
just a sec... have to find that window... I'll screen it if so

vlowther
2017-06-12 19:10
no need

vlowther
2017-06-12 19:10
jsut paste the contents of /proc/cmdline


2017-06-12 19:11
@ctrees I'll make sure the docs get updated when we resolve this, thanks for your patience

2017-06-12 19:11
I have a heck of a time getting paste from vbox

vlowther
2017-06-12 19:12
ok

vlowther
2017-06-12 19:12
from there, run cat /proc/cmdline


vlowther
2017-06-12 19:14
hm

vlowther
2017-06-12 19:17
You are running drp on a mac?

2017-06-12 19:17
yea... I was going to start to play with how drp passes in configs... as this gets to the root of what I was attempting to replace... all our pxe/ks and internal IPAM stuff... seems like a great fit..

vlowther
2017-06-12 19:17
Not in a VM?

2017-06-12 19:17
yup

vlowther
2017-06-12 19:17
yeah

2017-06-12 19:17
mac mini

2017-06-12 19:17
16GB

vlowther
2017-06-12 19:18
so you will need to run drp with the --static-ip option


vlowther
2017-06-12 19:19
We have code in place that works reliably on Linux to figure out what IP address recieved a given packet so that we can handle running properly in a multi-homed environment.

vlowther
2017-06-12 19:19
but that code is less reliable on macs.

2017-06-12 19:20
Well... the 'quick fix' 4me is to custom the pxe to put the 192.168.66.1 IP in for default ?

2017-06-12 19:21
... wow... your telling me you on the fly ID that stuff in the DNS... that's cool...

vlowther
2017-06-12 19:21
Just pass --static-ip=192.168.66.1 to dr-provision at startup

vlowther
2017-06-12 19:21
no, not DNS

vlowther
2017-06-12 19:22
We inspect the raw IP packets for incoming DHCP and TFTP traffic to determine the IP address that the incoming packet was destined for

2017-06-12 19:23
I about did that... checking now ( --static-ip )

vlowther
2017-06-12 19:23
so we can set the address we live at appropriately in future responses that require it.

vlowther
2017-06-12 19:23
On Linux this works reliably.

vlowther
2017-06-12 19:24
On macs we sometimes get all zeroes or other nonsensical data, so we have to fall back to whatever --static-ip is set to

2017-06-12 19:24
rebooting bootme now...

2017-06-12 19:25
working

2017-06-12 19:26
looks good

vlowther
2017-06-12 19:26
this allows us to work properly in multihomed scenarios where a system lives on networks A and B but there is no route between A and B

vlowther
2017-06-12 19:26
and it allows us to work without needing a restart when network interfaces come and go.

2017-06-12 19:27
yea... I'm pretty jazzed about you approach... working up a demo for my boss/es

vlowther
2017-06-12 19:28
I am also impressed to see a mac mini server still being used in the wild. :slightly_smiling_face:

2017-06-12 19:28
oh heck... THATS my GOOD box... ;-)

2017-06-12 19:29
but I've got 3 piles of HPE C7000 I'm going to 'resurect' for devops

2017-06-12 19:32
If I get the AFS (Andrew, not Apple) setup into a workload... I'll for sure shove that back to you/community as we need to support AFS for more years and CERN is dropping it...

2017-06-12 19:32
I'm updating docs to improve notes on Mac

2017-06-13 00:11
Hello, I'm not sure if this is the right room to ask questions. But anyways, I'm seeing if I can try out digital rebar with vagrant. Following these directions here: http://digital-rebar.readthedocs.io/en/latest/deployment/install/vagrant.html

2017-06-13 00:12
The Vagrantfile points to a bad link, but looking around I found it at https://github.com/digitalrebar/digitalrebar/blob/master/deploy/Vagrantfile

2017-06-13 00:13
The Vagrantfile also references some scripts in there I do not have in this section: # # Admin nodes eat themselves without swap # base.vm.provision "shell", path: "scripts/increase_swap.sh" base.vm.provision "shell", path: "quickstart.sh"

2017-06-13 00:14
Where can I find these files? Or can I just comment that out?

2017-06-13 00:26
The errors I'm seeing when trying to bring it up is: # $ vagrant up base --provider=libvirt =========================================================== Welcome to Digital Rebar Vagrant Machine types available: client (run remote DR) base (DR server) node1[-20] (node for local test) Documentation: https://github.com/digitalrebar/doc/blob/master/deployment/vagrant.rst export REBAR_ENDPOINT and REBAR_KEY to use existing Digital Rebar Server (default = Vagrant admin) TRIGGERS REQUIRED: vagrant plugin install vagrant-triggers see http://www.rubydoc.info/gems/vagrant-triggers/0.2.1 REBAR CLI REQUIRED: rebar cli must be on your path see https://github.com/digitalrebar/doc/tree/master/cli Maintained by RackN, Copyright 2016 =========================================================== To monitor > https://192.168.99.100 (Digital Rebar) After the system is up, you can start the nodes using `vagrant up /node[1-20]/` Bringing machine 'base' up with 'libvirt' provider... There are errors in the configuration of this machine. Please fix the following errors and try again: shell provisioner: _ `path` for shell provisioner does not exist on the host system: /home/choyj/realdr/scripts/increase_swap.sh _ `path` for shell provisioner does not exist on the host system: /home/choyj/realdr/quickstart.sh Vagrant: * Unknown configuration section 'trigger'.

2017-06-13 03:40
@jack-likes-to-code it's been a while since we've updated those vagrant scripts. lately everyone runs docker locally, so they are OK w/ the compose approach on their laptop or the Ansible install from a VM or server

2017-06-13 03:40
I do know that you'll need to install the Vagrant triggers for it to work

zehicle
2017-06-13 03:56
@rstarmer I found the default user for debian in the docs: "rocketskates"

rstarmer
2017-06-13 04:36
? yeah, I tried the documented ansewrs already, but they don?t appear to work. I wonder if our addition of an ssh auth token changes the bootstrap code?

greg
2017-06-13 13:21
ssh keys are for root.

greg
2017-06-13 13:22
rocketskates/RocketSkates

greg
2017-06-13 13:22
should be the login user.

2017-06-13 13:54
I'll update the docs to clarify and see about putting it an faq also

2017-06-13 17:40
Hi @zehicle I started reading about/trying out digital rebar yesterday so I'm noobish in this area as my team is interested in using that for cloud deployment. I believe we(SUSE) have a meeting with you on Thursday to go over digital rebar, but I wanted to give it a try before going into the meeting. I was able to get it to run on a vagrant VM yesterday just following section 1.1.2. I could try again just getting it to run via the compose approach, but is there a way to use local VMs as nodes?

2017-06-13 17:42
ls

vlowther
2017-06-13 18:17
Sure, as long as the VMs are attached to thee same bridge that the admin node is serving DHCP on, and there are no conflicting DHCP servers.

vlowther
2017-06-13 18:19
If you are running on a Linux box, digitalrebar/core/tools/docker-admin will bring up the DR admin containers, and digitalrebar/core/tools/kvm-slave will spin up QEMU instances that attach to the docker0 bridge and PXE boot by default.

2017-06-13 18:20
oh! I'll have to look into that... installing Ubuntu 16.04 on another box right now. Will give this a try. Thx!

vlowther
2017-06-13 18:21
Those are the tools I use in my daily development.

vlowther
2017-06-13 18:22
has not teested them on opensuse in a while, tho.

vlowther
2017-06-13 18:22
Arch Linux is my daily driver.

2017-06-13 18:23
Currently using Ubuntu for now...

vlowther
2017-06-13 18:24
I won't tell anyone else. :slightly_smiling_face:

2017-06-13 18:28
lol, I'm a transplant from HPE that was acquired by SUSE 3 months ago... Luckily, SUSE is pretty open about these things. :)

2017-06-13 20:05
@jack-likes-to-code there's a video in the training sessions where we show how to do the docker-admin & kvm-slave path

2017-06-13 20:06
@zehicle I was just going to ask you about this

2017-06-13 20:06
can you point me to that?

2017-06-13 20:06
you should also make sure to try DR Provision - it's the newer model deployment

2017-06-13 20:07
I got to the point of bringing up digital-rebar, but I went ahead and ran the kvm-slave to bring up a node and it sits there with: 2017-06-13 13:01:21 -0700: 26660 - PXE booting node (0)

2017-06-13 20:07
should i have run the docker-admin script first?

2017-06-13 20:08
Not familiar with DR Provision. Can you point me to that documentation?

2017-06-13 20:09
https://youtu.be/OBK1Gkv0YH8?list=PLXPBeIrpXjfh2lXdXkNnzAuc7_SUtYJR-&t=243

2017-06-13 20:09
For DRP http://provision.readthedocs.io/en/latest/

2017-06-13 20:09
thanks, I'll take a look

2017-06-13 20:10
@jack-likes-to-code the challenge w/ the kvm-slave script is that is assumes your using docker0 as the bridge. you'll need to adjust the script to use another bridge if you used Vagrant to bring up the admin now

2017-06-13 20:10
not using vagrant anymore. Just installed ubuntu on a box and ran the quickstart script on it

2017-06-13 20:11
that brought up digital rebar and I can see the UI

2017-06-13 20:11
you can change the bridge by setting the "OCB_BRIDGE" environment variable

vlowther
2017-06-13 20:11
jack-likes-to-code: Yes, you need to run docker-admin and wait for Digital Rebar to come up before running kvm-slave

2017-06-13 20:11
I think kvm-slave should work in that case

vlowther
2017-06-13 20:11
Otherwise the VM will not be able to PXE boot and just sit there forever.

2017-06-13 20:12
ok

2017-06-13 20:12
BTW > Victor is using our Slack bridge, which impersonates by Gitter account. So there are really two people :)

2017-06-13 20:12
I was wondering why victor comes up in all your msgs

zehicle
2017-06-13 20:13
this is how my (Rob's comments) show up by typing from Slack

2017-06-13 20:14
For people who prefer Slack, we're happy to provide invites to that channel. either way, the comments are mirrored

2017-06-13 20:16
@zehicle running docker-admin gives me the following error: ERROR: for forwarder driver failed programming external connectivity on endpoint compose_forwarder_1 (b3cc8cd5024e4c714ec3c14c70fe65d8f745ca7d55943a82cc1df5e9c0916151): Bind for 0.0.0.0:3000 failed: port is already allocated Traceback (most recent call last): File "<string>", line 3, in <module> File "compose/cli/main.py", line 63, in main AttributeError: 'ProjectError' object has no attribute 'msg' docker-compose returned -1 Bringing containers up. Exiting the shell will kill and remove the containers It looks like docker-proxy is already listening on port 3000

2017-06-13 20:17
is it because I'm already running digital rebar? Perhaps I should bring that down first before running docker-admin?

2017-06-13 20:35
yes. from digitalrebar/deploy/compose, use docker-compose stop then docker-compose rm -f

2017-06-13 20:44
@zehicle done. I kicked off docker-admin. Looks like ntp service has been sitting around: ```dev@Z640-extra:~/digitalrebar/deploy/compose$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 5699fd40e04f digitalrebar/dr_rebar_api:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_rebar_api_1 cb604b7ea9c0 digitalrebar/dr_goiardi:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_goiardi_1 c4e400eddb06 digitalrebar/cloudwrap:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_cloudwrap_1 87050fbf06bf digitalrebar/dr_trust_me:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_trust_me_1 1557912d03d7 digitalrebar/dr_webproxy:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_webproxy_1 faf9eaaa8484 digitalrebar/dr_rev_proxy:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_revproxy_1 d8853af62a83 digitalrebar/dr_rebar_dhcp:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_dhcp_1 c3b1b4323d2c digitalrebar/dr_provisioner:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_provisioner_1 faf2f86e6631 digitalrebar/logging:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_logging_1 6f7e5d0761e9 digitalrebar/dr_dns:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes compose_dns_1 d7880c396047 digitalrebar/dr_forwarder:master "/sbin/docker-entr..." 3 minutes ago Up 3 minutes 0.0.0.0:443->443/tcp, 0.0.0.0:3000->3000/tcp compose_forwarder_1 555d43930ce9 digitalrebar/dr_postgres:master "/docker-entrypoin..." 3 minutes ago Up 3 minutes compose_postgres_1 9696410fe96d digitalrebar/rule-engine:master "/sbin/docker-entr..." 3 minutes ago Up 31 seconds compose_rule-engine_1 7a49fbda0ce0 gliderlabs/consul "/bin/consul agent..." 3 minutes ago Up 3 minutes compose_consul_1 de8cea4fbee6 digitalrebar/dr_ntp:master "/sbin/docker-entr..." About an hour ago Up About an hour 0.0.0.0:123->123/tcp, 0.0.0.0:123->123/udp compose_ntp_1

2017-06-13 20:47
can't seem to figure out preformat rebar/rebar1 creds no longer work. What are the new creds when running docker-admin?

zehicle
2017-06-13 20:49
it takes a little while before the user is created

2017-06-13 20:50
thx. i guess i was too impatient

2017-06-13 20:57
still no cigar with kvm-slave. i am able to login to the UI, but kvm-slave still gets stuck at PXE booting node (0). You mentioned that docker0 is the assumed bridge. How do I determine that? `dev@Z640-extra:~/digitalrebar/deploy/compose$ docker network ls NETWORK ID NAME DRIVER SCOPE 1a30c0576175 bridge bridge local 918aa569f74b host host local 902721abd2e5 none null local

2017-06-13 21:00
Trying this again: ``` dev@Z640-extra:~/digitalrebar/deploy/compose$ docker network ls NETWORK ID NAME DRIVER SCOPE 1a30c0576175 bridge bridge local 918aa569f74b host host local 902721abd2e5 none null local ```

2017-06-13 21:03
It seems everything is up on docker0: ``` dev@Z640-extra:~/digitalrebar/deploy/compose$ brctl show bridge name bridge id STP enabled interfaces docker0 8000.0242b3fb13c2 no veth1c3107d veth3ad8868 veth3d311d6 veth42dbc38 veth5268663 veth73ba994 veth74f95ac veth91e9b8a vethc2366ce vethe6c4c02 vethf3a9141 vm-88964-0 vm-88964-1 vm-88964-2 virbr0 8000.000000000000 yes ```

2017-06-13 21:04
there's a IP bridge that you need... forgot to mention!

2017-06-13 21:04
sudo ip a add 192.168.124.4/24 dev docker0

2017-06-13 21:05
http://digital-rebar.readthedocs.io/en/latest/development/dev_env/kvm-slaves.html?highlight=kvm-slave

2017-06-13 21:06
then restart the VMs

2017-06-13 21:37
Thanks! That worked! More reading!

2017-06-13 21:38
:)

2017-06-14 23:22
I see my nodes came up with sledgehammer, then I figured I'd try and deploy Ubuntu on there, but it got stuck trying to find an NTP server. I walked away and eventually the kvm consoles went blank and I couldn't get them out of the blank state. DR shows the nodes as green. It doesn't appear the NTP service is running on there when running docker-admin. How do I bring that up? ``` dev@Z640-extra:~/digitalrebar/deploy/compose$ docker-compose ps WARNING: The DR_TAG variable is not set. Defaulting to a blank string. Name Command State Ports ------------------------------------------------------------------------------------------------------------- compose_cloudwrap_1 /sbin/docker-entrypoint.sh Up compose_consul_1 /bin/consul agent -config- ... Up compose_dhcp_1 /sbin/docker-entrypoint.sh Up compose_dns_1 /sbin/docker-entrypoint.sh Up compose_forwarder_1 /sbin/docker-entrypoint.sh Up 0.0.0.0:3000->3000/tcp, 0.0.0.0:443->443/tcp compose_goiardi_1 /sbin/docker-entrypoint.sh Up compose_logging_1 /sbin/docker-entrypoint.sh Up compose_postgres_1 /docker-entrypoint.sh postgres Up compose_provisioner_1 /sbin/docker-entrypoint.sh Up compose_rebar_api_1 /sbin/docker-entrypoint.sh Up compose_revproxy_1 /sbin/docker-entrypoint.sh Up compose_rule-engine_1 /sbin/docker-entrypoint.sh Up compose_trust_me_1 /sbin/docker-entrypoint.sh Up compose_webproxy_1 /sbin/docker-entrypoint.sh Up ```

2017-06-15 00:35
very strange! you should be able to docker-compose ntp up -d. When you install, using --con-ntp should work; however, it should be included in your default set

2017-06-15 00:35
check the NTP role in the system deployment

2017-06-15 00:36
and see how that is set, but it should have have turned green if the service was missing unless you picked external ntp.

2017-06-15 13:34
@jack-likes-to-code FWIW, NTP was in your earlier docker ps. Very strange that it's not in the docker-compose ps list!

2017-06-15 16:29
we're tracking down a very critical typo in the docs... default password is "rocketsk8ts" - it is correct in the source, but not updated in the generated doc pages

2017-06-15 16:30
NOTICE: DOCS ARE UPDATED

greg
2017-06-15 17:21
I need to move tip to get latest updated. We may need to cut a release to update stable.

zehicle
2017-06-15 17:32
I had RTD rebuild and it worked

2017-06-15 18:38
@zehicle docker-compose ntp up -d resulted in a "No such command: ntp". No ntp in docker-compose.yml either.

greg
2017-06-15 20:33
@Jack-likes-to-code: did you modify common.env? That can alter NTP operations.

2017-06-15 20:40
@zehicle I haven't touched that file. Don't know what it's used for

2017-06-15 20:42
Here's what I have for the NTP section in that file: ``` # # NTP Parameters # # Should we run NTP as intermediate or only # If EXTERNAL_NTP_SERVERS is not specified, # the local time in the container will be used. # NTP_RUN_PROXY=YES # This defines the upstream NTP servers # If NTP is running, it will forward to these. # If NTP is not running, then this is injected set. # Comma separated list of ips or names EXTERNAL_NTP_SERVERS= ```

2017-06-15 21:11
under digitalrebar/deploy/compose look at the init_files.sh script. That changes that compose.yml. Something in your system much have tweaked inputs to that file when you restarted. it could have been an environment variable.

2017-06-15 21:12
i do remember when I first did the quickstart, the ntp container was started. When I shut it down, it did not stop/remove the ntp service so i manually did that via docker

2017-06-15 21:13
then when I started DR via docker-admin, the ntp service did not come back up

2017-06-15 21:14
who kicks off init_files.sh? Should I kick it off again?

zehicle
2017-06-15 22:01
the install script uses it at the end of the ansible run

zehicle
2017-06-15 22:02
the run-in-system script can be run multiple times

zehicle
2017-06-15 22:02
but will tear down the running system

2017-06-15 22:02
ok. I may try that another time, then.

zehicle
2017-06-15 22:02
if you want to keep your config, then you need to modify the config files

zehicle
2017-06-15 22:02
those are input into the install

2017-06-15 22:02
It's fairly easy to setup, so no harm done in resetting it up

2017-06-19 18:23
Hi, It's possible to use CoreOS as OS? Thanks

greg
2017-06-19 19:17
For what piece of code?

2017-06-19 19:20
The master and nodes (I already have an Ubuntu box with docker-compose running)

greg
2017-06-19 19:42
okay - so DR as opposed to DRP.

greg
2017-06-19 19:43
I've been playing with CoreOS and DRP. It is possible. DR should be similar

greg
2017-06-19 19:43
I created a custom bootenv that served the kernel and initrd and then served an ignition file that start the components I wanted.

greg
2017-06-19 19:43
It could have installed instead.

greg
2017-06-19 19:44
I don't have anything checked in. It was more a play-with-it-to-see if it could be done.

greg
2017-06-19 19:44
Nothing real.

greg
2017-06-19 19:44
Right now

spector
2017-06-26 15:02
has joined #json

spector
2017-06-26 15:02
Spector here

2017-06-27 12:39
Hello! I have been looking at the OpenStack Helm project and ran into digital rebar.. It looks pretty impressive! I noticed however that the quick start guide does not include bare metal provisioning. My goal is to get to this point: https://www.youtube.com/watch?v=6xuVm9PJ2ck Any advice?

zehicle
2017-06-27 13:41
the docs will take you through an physical install and there are videos (http://digital-rebar.readthedocs.io/en/latest/deployment/README.html). We (RackN) would be to have a call with you to talk about the environment before you start. We find that a short call can save a lot of time in networking.

2017-06-27 13:50
I am much more of a video person (which I am super thankful that y'all have so many!). Do y'all have any videos covering the setup with bare metal?

2017-06-27 23:04
it's different for each person due to their setup - that's why we suggest a live meeting.

2017-06-29 22:32
I did the quick config on a 32GB Ubuntu 16.08 clean install: curl -fsSL https://raw.githubusercontent.com/digitalrebar/digitalrebar/master/deploy/quickstart.sh | bash -s -- --con-provisioner --con-dhcp --admin-ip=192.168.88.228/24

2017-06-29 22:36
Of which the 192.168.88.228 was the DHCP IP handed out on the .88 sub by a mikrotik router.... I'm still waiting for the ansible playbook to end, but I had started to poke around on the web UI... I had earlier installed the provisioner ONLY on another machine on another test network

2017-06-29 22:38
Your comments above lead me to believe the provisioner is not installed ? http://provision.readthedocs.io/en/stable/doc/quickstart.html

greg
2017-06-29 22:41
You are mixing docs.

greg
2017-06-29 22:42
Your command line is for digitalrebar and not digitalrebar provision (the doc link). You want the http://digitalrebar.readthedocs.io/en/stable/doc/quickstart.html

greg
2017-06-29 22:42
WIth that command line, you should have the provisioner and the dhcp server

2017-06-29 22:47
OK... I though so, just was not sure if the other is better to use for metal ? I noticed you've broke it out into an independent github repo... but you've got such flexibility with all these services it, I have to check the 'spinning top'

greg
2017-06-29 23:06
Yeah - we are slowly splitting things out for better control and non-monolithic services. Slowly.

jacob
2017-06-30 16:14
has joined #json

2017-06-30 16:29
OK... I 'think' I understand my DHCP/PXE problem... see if I can explain as I'm sure it's just me missing something. I've got a HPE C7000 that I'm testing with... I pulled all the blades and drives, put in one blade and one new drive, loaded Ubuntu on that and did the quick config mentioned above. I think my issue is that the fabric switches (Cisco 3020), OnboardAdmin Module, and iLO use the little MikroTik to grab management IP's via DHCP. I listened to your RackN Digital Rebar Training (006): Configure DHCP and Network, where I extracted the fact that you need to put an IP on the admin DHCP network. So, I went into the UI, DHCP Subnets, admin-internal where the subnet was 192.168.99.0/24. Given that network, I added 192.168.99.2 IP to enp3s0 (static) and ifup. I went out to the mikrotik and disabled it's DHCP. I then inserted another blade, set it to PXE boot.... thinking this 'should' work.

2017-06-30 16:32
The new blade is now in a PXE boot-loop, not finding the DHCP/PXE that I 'think' is running on 192.168.99.2 enp5s0

2017-06-30 16:33
OR... is the 'ADMIN_IP' need to change from 192.168.88.2 enp3s0 (as installed) to 192.168.99.2 enp5s0

2017-06-30 16:35
I'll go through that video again and take another look at the docs... that video does have a lot of deep 'gems' plus I know it showed how to get to the logs...

2017-06-30 16:39
What I might do is put DR on a machine OUTSIDE of the c7000 and let the DR machine capure all the DHCP... I may be fighting all the management IP DHCP from the OA, iLO and interconnect switches of the c7000.

stanchan
2017-07-03 19:52
has joined #json

greg
2017-07-03 22:07
Published v3.0.5 of DRP - It has doc updates and some UI updates.

2017-07-03 22:17
@ctrees I think you've cross into the "let's get on the phone" type of questions

2017-07-03 22:41
I'm good with skype ?

2017-07-03 22:42
sorry to delay you... needs to hold off for the holiday. want to work 1x1 to find times for Wednesday?

2017-07-03 22:46
oh I'm in no big hurry... plus it's motivating me to dig into the docs (I just finished a read through)...

ctrees
2017-07-03 23:30
has joined #json

alan.mcalexander
2017-07-05 21:17
has joined #json

ctrees
2017-07-11 20:36
Is the RackN team know about gns3 http://gns3.com ? what I'm attempting now is to create a network training platform so that the dev and ops team can re-use infrastructure AND the OpenDaylight controller

greg
2017-07-11 20:39
I?m not aware of it.

ctrees
2017-07-11 20:41
I'm almost thinking I could create components for each of the RackN docker containers and put them into a gns3 appliance template... the idea would be netops folks would be able to grok dev parts (new docker) while dev folks can better utilize all the cool dr inventory manangement

greg
2017-07-11 20:43
maybe - not sure.

ctrees
2017-07-11 20:46
When I was attempting to educate both dev and ops on why/how we should clean up the subnets, I started to build up the network model in gns3 so I could let them inspect the network traffic with wireshark.... the newer gns3 server is doing about the same thing (arch wise) as you were doing (making all services in containers)...

greg
2017-07-11 20:47
okay - I?d need to look at it and see.

ctrees
2017-07-11 20:48
anyway... If I get something useable I'll link it here... I'll attempt it, just when I dug into it's V2.0.x release it fits very well with your containers...

ctrees
2017-07-11 21:52
I think this is where I get hung up with dr install... based on:

ctrees
2017-07-11 21:52
curl -fsSL https://raw.githubusercontent.com/digitalrebar/digitalrebar/master/deploy/quickstart.sh | bash -s -- --con-provisioner --con-dhcp --admin-ip=1.1.2.3/24

ctrees
2017-07-11 21:52
from the github quickstart...

ctrees
2017-07-11 21:53
should I first set a static IP on the 'clean' ubuntu that ONLY has the one route to gateway then use that static as the admin-ip ?

ctrees
2017-07-11 21:56
in reality I am putting this in a nested vm using vmare with an ubuntu that runs qemu and docker... which I'll attempt later to see if I can package your docker images into that... but for now, I am launching a vmware vm (ubuntu) to just run dr but let the gns3 manage the network (it is running the pfsense router withing a virtual network which can be nat'd out to internet

ctrees
2017-07-11 21:58
Just as RackN has a way to bring up qemu pxe clients, gns3 does the same thing but for cisco and other router os's

2017-07-11 22:00
OH.. that confused me abit... the 'Rob' bridge echo

greg
2017-07-11 22:24
admin-ip is what DR is going to advertise as itself when nodes try and talk aback to it.

greg
2017-07-11 22:24
Nodes should be able to talk to DR through that IP.

ctrees
2017-07-11 22:25
so it's has to hear broadcast traffic....

greg
2017-07-11 22:25
well - that is the DHCP aspect. It doesn?t if you have helpers that can direct to that IP. DHCP is a little different because of its L2 specific nature.

greg
2017-07-11 22:26
DHCP server will do the ?right? thing with regard to picking interfaces and IPs to respond within the DHCP protocol. The challenge is in the provisioner and beyond and that is controlled by the admin-ip.

ctrees
2017-07-11 22:26
which was where I think I got caught before, as I had multiple IP's and I couldn't figure out how it was listening or where in the scripts the route tables were added...

ctrees
2017-07-11 22:27
what I should go do is figure out why/how the forwarder service works... that'll probably explain it ??

greg
2017-07-11 22:28
ugh - the bane. We have a pull request that we haven?t finished to remove FORWARDER. I?m not overly fond of it for general use.

ctrees
2017-07-11 22:28
as my guess ?? is the forwarder is how you deal with virtual routes internally... ?

greg
2017-07-11 22:28
FORWARDER mode was mostly so we could run a system on a system without having to kill dnsmasq.

greg
2017-07-11 22:29
In general, it causes more pain than good.

ctrees
2017-07-11 22:29
not going to use it, but that sort of 'inception' thing is basically the onlything I get tripped up on...

greg
2017-07-11 22:30
I tend to do HOST for just about everything. FORWARDER only really works (you make force it) for linux boxes running isolated kvms that are also attached to the docker bridge.

greg
2017-07-11 22:30
Or bridging whole nics onto the docker bridge.

ctrees
2017-07-11 22:30
I'm hoping to setup 'real router' os simulations that then hook into the newer SDN stuff...

ctrees
2017-07-11 22:31
OH... so what's the 'do all' command line for HOST ? just leave off the admin-ip command ?

greg
2017-07-11 22:32
The model I?ve been preferring is that you shouldn?t really think of DR as a collection of services running in docker, but a single system that provides some endpoints. HOST mode does this view better.

ctrees
2017-07-11 22:32
.... never-mind... I need to just figure it out... and your scripts have it all when I dig... your just way to 'flexable' :wink:

ctrees
2017-07-11 22:33
OK... so if I do a VM (clean install of ubuntu) which do-it-all script should I run and how should I bring up test pxe (in quem or other vms)

ctrees
2017-07-11 22:35
Oh... the github quick dangerous was hostmode... so I'm good

greg
2017-07-11 22:35
yes

ctrees
2017-07-12 15:59
In a 'test-lab' situation, is it better to have 2 network cards and let the host do the route/nat/firewall also ? aka the admin-ip becomes the gateway ?

ctrees
2017-07-12 16:02
basically I'm thinking if I tell them this replaces the 'soho router' for testing... they'll grok it faster...

greg
2017-07-12 16:42
That is how I kvm test - it also lets me then do isolated testing with DR as webproxy and without.

ctrees
2017-07-12 16:46
so you do that 'with a vm' and 'on real' h/w... it's the nesting of vm and the network stack as seems like lots of tools are now attempting to 'help' make adjustments... netstat -rn is showing me lots of adjustments... (more from VMware, VirtualBox and GNS3... so it's not really anything to do with DR, but through the DR scripts I'm finding out how you guys deal with those situations)

ctrees
2017-07-12 16:52
Say... while I'm on my little network inception mind-bend... would you just PXE boot to recycle or use Mesos ? or go kubernetes... I know it 'depends' but was wondering when I watched your packet demo's how you would add additional packet servers to the test cluster (as you were just adding quem vm's in that video demo)

ctrees
2017-07-12 16:56
seems like you guys have basically tested them all... seems like going back down to metal with PXE is the cleanest for recycle... I eventually want to attempt to recycle to move equipment to a new resource pool which I'm pretty sure you guys have done

greg
2017-07-12 17:08
We usually recommend a complete rebuild. Mesos should work because it handles dynamic works. K8S is ideal. workers are supposed to be replaceable.

2017-07-12 17:10
so that's why the drive to K8S demos then

greg
2017-07-12 17:16
well - it is also becoming more popular than Mesos. It appears. Lots of movement there.

ctrees
2017-07-12 19:38
So is Goiardi eventually going away ? which container drives the Annealing process ? I take it server state status is keep in the protgres db then changes are change events that trigger the annealing... just not sure what service does it

greg
2017-07-12 19:40
rebar-api drives annealing process.

greg
2017-07-12 19:40
goiardi is a go-based chef server. It is used by some roles and currently won?t go away for a while.

greg
2017-07-12 19:41
rebar-api is a rails app that handles the API layer (mostly) and a set of worker threads do annealing.

ctrees
2017-07-12 19:48
I noticed the consul, so I was wondering... you must have written the rebar-api pre terraform ? (more evolution curiosity is all...)

greg
2017-07-12 19:51
yes, but rebar-api does a lot more that terraform ever will and less at the same time.

ctrees
2017-07-12 19:52
more the hope to avoid ruby... at least hashi started to put newer stuff in Go

greg
2017-07-12 19:52
Well - the core of that rebar-api was started 8 years ago.

greg
2017-07-12 19:53
We are moving it to go overtime.

ctrees
2017-07-12 19:53
which I know you guys are doing also (Go)... yea and you were ops guys so hard to avoid ruby 8 year...

greg
2017-07-12 19:54
we needed an API endpoint that was the UI as well. That pretty much meant rails or a really bad state of django or some java thing.

greg
2017-07-12 19:55
I find terraform confounding. It is good but unbounded. Much like ansible.

greg
2017-07-12 19:57
You can do anything and everything and so people do and there is very low repeatability, testability, and abstraction. It makes it really hard to isolate problems or operations.

ctrees
2017-07-13 16:35
Clean install on Ubuntu VMWare, 2 network (host only and a bridge to internet)

ctrees
2017-07-13 16:35
TASK [Update repos (was not working from apt:)] ****************************************************************************************************************** [WARNING]: Consider using apt module rather than running apt-get fatal: [172.16.240.2]: FAILED! =>

ctrees
2017-07-13 16:36

ctrees
2017-07-13 16:36
Digital Rebar UI https://172.16.240.2 cat@ubuntu:~/digitalrebar/deploy$ sudo ./run-in-system.sh --deploy-admin=local --access=HOST --admin-ip=172.16.240.2

greg
2017-07-13 16:37
Admin IP needs a CIDR

ctrees
2017-07-13 16:39
sorry... what's CIDR ? ? env thing ?

ctrees
2017-07-13 16:41
or is that setup in the startup... I cold just reset... and use quickstart... I didn't realize quickstart need 6GB (had 4GB set)...

ctrees
2017-07-13 16:42
but I saw the attempt at root in the past... thinking it's an ssh key / ansible thing


ctrees
2017-07-13 16:44
OK... --admin-ip=172.16.240.2/24

ctrees
2017-07-13 16:49
failure on same step...

ctrees
2017-07-13 16:49
cat@ubuntu:~/digitalrebar/deploy$ netstat -rn Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.9.1 0.0.0.0 UG 0 0 0 ens33 172.16.240.0 0.0.0.0 255.255.255.0 U 0 0 0 ens38 192.168.9.0 0.0.0.0 255.255.255.0 U 0 0 0 ens33

ctrees
2017-07-13 16:50
cat@ubuntu:~/digitalrebar/deploy$ ifconfig ens33 Link encap:Ethernet HWaddr 00:0c:29:42:6b:8e inet addr:192.168.9.62 Bcast:192.168.9.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe42:6b8e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6618 errors:0 dropped:0 overruns:0 frame:0 TX packets:361 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2917977 (2.9 MB) TX bytes:34729 (34.7 KB) ens38 Link encap:Ethernet HWaddr 00:50:56:39:54:5f inet addr:172.16.240.2 Bcast:172.16.240.255 Mask:255.255.255.0 inet6 addr: fe80::250:56ff:fe39:545f/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1563 errors:0 dropped:0 overruns:0 frame:0 TX packets:1166 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:122905 (122.9 KB) TX bytes:441659 (441.6 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:301 errors:0 dropped:0 overruns:0 frame:0 TX packets:301 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:36506 (36.5 KB) TX bytes:36506 (36.5 KB)

greg
2017-07-13 19:33
yes

ctrees
2017-07-14 17:30
So I got the quick install running on Ubuntu (under VMWare as it supports nested virtualization) I added another VMWare VM set it to PXE boot, and it did... (the node showed up in the DHCP with it's MAC and IP) but when it rebooted, the VM image seemed to have lost the network card used during the PXE boot. The Host system still sees the IP and the route but I don't think sledgehammer can report info to DR.

ctrees
2017-07-14 17:32

ctrees
2017-07-14 17:34
I suspect it's a VMWare thing just wondering if you've ever seen this issue.

ctrees
2017-07-14 18:22
Tried the 'devices.hotplug = "FALSE" ' .vmx mod solution


ctrees
2017-07-14 18:23
no happiness...

greg
2017-07-14 18:32
You need to make sure that the mac is set and not changing. Also, the client vm should have only one NIC in one network.

ctrees
2017-07-14 18:35
Yup... I set MAC static AND only one NIC... again I'm pretty sure it's VMWare dropping the darn thing, just wondering if you've ran into it.

ctrees
2017-07-14 18:39
The only reason I'm doing the VM stuff is so the devs and ops guys can review each-others mods on an isolated system... I'm replicating the same setup in the C7000 blade center right now... I don't expect the have an issue with real metal as I've seen it work already. The C7000 blade setup had issues with it's control plane wanting / defaulting to DHCP and need that to go access the VLAN stuff... all this inception stuff for isolation

ctrees
2017-07-14 22:07
So the best way to bring a whole grid down is digitalrebar/deploy/backup.sh then bring back up with digitalrebar/deploy/restore.sh


ctrees
2017-07-14 22:08
In backup.sh has a comment that I'm not following...

ctrees
2017-07-14 22:08
# This script handles getting the first 5 items, getting the third is # left as an exercise for the reader. #community

ctrees
2017-07-14 22:09
it name 3 data sets, then 5 items, then mention 'the third' is left for the reader ?

zehicle
2017-07-14 23:18
"The file data for Goiardi" - we don't recover Chef information

zehicle
2017-07-14 23:19
but... it looks like it does in the script. It's possible that comment is out of date

ctrees
2017-07-17 15:56
I viewed the new Packet IPXE Test w DRP Endpoint RackN & Digital Rebar video. What is the best source to setup the DHCPd Options and/or DNSMasq Options to emulate how the PXE hand-off ( as seen in video timecode: 4:22 )

ctrees
2017-07-17 16:11
Following that hand-off sequence seems to be key to putting DR into an existing PXE setup. I found: http://provision.readthedocs.io/en/stable/doc/arch/data.html#rs-dhcp-models - 6.3.2.1 Subnet, but also remember seeing a more detailed DR doc about the PXE process, but cannot find it again.

ctrees
2017-07-17 16:44

ctrees
2017-07-17 16:57
I take it the 'custom iPXE' option basically maps a 'Request Next Boot' to the provisioned mac I think I'm just struggling with what the first PXE boot is on the main DNS... I am just assuming on the packet side they are setting up a VLAN but maybe not... it all could be my weakness in following PXE... so your new video showed how the packet server was bouncing through a few NEXT BOOT (I think)

ctrees
2017-07-17 19:05
OH... I think I get it now: 1 - Server gets IP 147.75.90.81 (very small subnet) via DHCP at 147.75.200.3 and asks for PXE (video TC: 4:19) 2 - DCHP server sends http://147.75.200.3/auto.ipxe (http://packet.net Baremetal boot image i'm guessing) 3 - (guessing as video refresh hits) auto.ipxe has 'next server' set to what was entered in the iPXE config: http://packet.rebar.digital/default.ipxe 4 - Server (PXE proto) looks for which does not exist... so it uses http://packet.rebar.digital/default.ipxe to find the next server 5 - At this point the server does 'goto sledgehammer' which points to http://147.75.73.23:80/sledgehammer/b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273/vmlinuz0 6 - THEN this image and it's control string adjusts the console... and loads the stage1.img of sledgehammer 7 - I assume now that's the discovery image but it's going to REPORT findings back to 147.75.73.23 ?

greg
2017-07-17 19:09
yes

greg
2017-07-17 19:09
That is the flow

ctrees
2017-07-17 19:12
Thanks... so sometimes when I load up DR, I don't see the docker ports mapped to Host ports... which part of the Ansible does that docker mapping, or can those run out-of-order ?

greg
2017-07-17 19:12
it is part of compose and docker

greg
2017-07-17 19:12
Ansible runs it


greg
2017-07-17 19:18
yep

wdennis
2017-07-18 16:45
@greg About ready to try an DRP upgrade from v3.0.3 to v3.0.5 - just follow http://provision.readthedocs.io/en/stable/doc/upgrade.html right?

greg
2017-07-18 16:51
Yes - I believe that most of the changes are all internal for those releases and docs.

wdennis
2017-07-18 16:54
Cool

wdennis
2017-07-18 16:57
Also, why do I have to do: `../drpcli bootenvs install bootenvs/[...]` instead of `./drpcli bootenvs install assets/bootenvs/{...}` ???

greg
2017-07-18 16:58
lame answer (kinda): because the install command assumes that there is an isos and templates directory that are peers to bootenv at the cwd of execution.

wdennis
2017-07-18 16:58
ah, thought that might be it

greg
2017-07-18 16:59
It needs to import isos (store isos) and import templates (if not found).

wdennis
2017-07-18 17:03
I just always forget that I have to cd into ?assets? before I do that

greg
2017-07-18 17:05
we are working on some big changes to how content will work and be viewed and distributed. The goal is to make it easier to track, update, and display. We?ll see if we succeed at that, but ..

wdennis
2017-07-18 17:05
On DRP specifically you mean? Or full DR

greg
2017-07-18 17:08
DRP

ctrees
2017-07-19 20:12
So... can I run the DRP with DR ? (I assume DRP is the new tip of what was the Provisioner container) ? digitalrebar/dr_provisioner

greg
2017-07-19 20:13
not really. There are some tricky join actions and things are not that simple. Yes, DRP replaces provisioner and dhcp containers, but it doesn?t integrate cleanly.

ctrees
2017-07-19 20:15
So workloads or not to workload is the divide between them (DR vs DRP) ?

ctrees
2017-07-19 20:16
Or is DRP sole purpose is crowbar update ?

ctrees
2017-07-19 20:18
woops... crowbar -sb-> Cobbler

greg
2017-07-19 20:19
workloads are currently DR only. DRP may have something similar one day or we may plug it into DR. The problem is that most people find DR is too complex for their needs and DRP with a little more. Cobbler with a few more features.

greg
2017-07-19 20:20
The plan is to get DRP into DR (or replace DR with a smaller thing that uses DRP) at some point. We are working to get DRP fully functional to what we want first.

ctrees
2017-07-19 20:29
Workloads REQUIRES DR makes sense... thanks!

2017-07-19 23:45
Got a quick question setup digital rebar provision and when it tries to upload any iso into digital rebar provision i get a context deadline exceeded - any ideas, I just followed the quickstart no other changes

greg
2017-07-19 23:51
Does DRP have access to the internet?

2017-07-19 23:52
ya im able to download the iso

greg
2017-07-19 23:52
Are you running as root?

2017-07-19 23:52
no let me try doing it that way one sec

greg
2017-07-19 23:52
well wait.

greg
2017-07-19 23:53
did you do ```sudo ./dr-provision ....```

greg
2017-07-19 23:53
?

2017-07-19 23:53
yes

greg
2017-07-19 23:53
Do you have passwordless sudo and did you put it in the bg with &

greg
2017-07-19 23:53
because sometimes - drp hangs waiting for a password to run it.

2017-07-19 23:53
no passwordless sudo, i had to put in the password

greg
2017-07-19 23:53
okay

greg
2017-07-19 23:54
does ```drpcli bootenvs list``` return almost immediately?

2017-07-19 23:54
let me try

2017-07-19 23:56
ya instant

greg
2017-07-19 23:57
ok

greg
2017-07-19 23:57
thinking

greg
2017-07-19 23:57
What command are you running that fails?

2017-07-19 23:58
../drpcli bootenvs install bootenvs/blahblah.yml

2017-07-19 23:58
well not the blah blah but u get the picture

2017-07-19 23:58
any of the yml files

2017-07-19 23:59
it downloads the isos, and they do show in the isos folder but when it tries the upload step it fails with that error

greg
2017-07-19 23:59
What version are you using? stable/default or tip?

2017-07-20 00:00
then one from the quickstart

greg
2017-07-20 00:00
okay - stable if you didn?t add ```--drp-version=tip```

2017-07-20 00:00
ya stable

greg
2017-07-20 00:00
Sooo - I?m not sure. What are you running on? CPU, memory, and disk?

greg
2017-07-20 00:01
Do you have enough space is the real question.

2017-07-20 00:01
hyper-v ubuntu vm 100gb of hd space for the vm

greg
2017-07-20 00:01
ok - should be okay.

2017-07-20 00:01
16.04 version of ubuntu

greg
2017-07-20 00:01
next cheat to get around this problem. I?ll have to try it to be sure, but was working yesterday.

greg
2017-07-20 00:02
Is this a custom bootenv you built or one of the defaults?

2017-07-20 00:02
default

greg
2017-07-20 00:02
we?ve only really been testing centos7 or ubuntu16.04 bootenvs.

greg
2017-07-20 00:02
Not sure if the others will work.

2017-07-20 00:03
i tested 14.04 and that one worked but when it booted it couldent reach the repo to download the rest of the install

2017-07-20 00:03
the other ones gave me the error even centos

greg
2017-07-20 00:03
Yes, your vms/nodes have to have internet access to use the ubuntu images.

greg
2017-07-20 00:03
hmm - okay.

greg
2017-07-20 00:04
in the provision directory where you installed. There should be a drp-data directory

greg
2017-07-20 00:04
inside that, you have tftpboot/isos

greg
2017-07-20 00:04
that is where the iso go.

greg
2017-07-20 00:04
can you check that directory? and look at the contents.

2017-07-20 00:05
ubuntu is there well the 14.04 one

2017-07-20 00:05
and sledgehammer

2017-07-20 00:05
sledgehammer works by the way

greg
2017-07-20 00:06
cool

greg
2017-07-20 00:06
hmmm - drpcli bootenvs show <bootenv in question>

greg
2017-07-20 00:06
does the errors field have anything?

2017-07-20 00:07
let me see

greg
2017-07-20 00:07
It seems like space or quotas or something.

greg
2017-07-20 00:07
since you?ve uploaded some,but not this one.

greg
2017-07-20 00:08
you can copy the iso from the assets/isos dir inth the tftpboot/isos directory and then update the bootenv.

greg
2017-07-20 00:08
drpcli bootenvs update - < bootenvs/<bootenv filename>

greg
2017-07-20 00:08
I think that will ?explode? the iso for that bootenv.

2017-07-20 00:09
so if it was lets say ubuntu 16.04 it would be drpcli bootenvs update -<bootenvs/ubuntu-16.04 ?

2017-07-20 00:09
ls

2017-07-20 00:10
oops

greg
2017-07-20 00:10
```drpcli bootenvs update ubuntu-16.04-install -<bootenvs/ubuntu-16.04```

2017-07-20 00:10
k i will give that a try

2017-07-20 00:11
thanks

greg
2017-07-20 00:11
otherwise, I?m running out of options. Stopping and starting drpcli will also explode isos, I think.

greg
2017-07-20 00:11
I need to step away for a while. back in an hour or so

2017-07-20 00:11
ill try running as root and see what happens as well

2017-07-20 00:11
no worries thanks

zehicle
2017-07-20 02:46

zehicle
2017-07-20 02:46
I have the right ISO and can use the CLI to upload. How do I know what to name it?

greg
2017-07-20 03:23
The file knows

greg
2017-07-20 03:24
VMware-VMvisor-Installer-201701001-4887370.x86_64.iso

zehicle
2017-07-20 03:31
thanks, I thought it was the name

zehicle
2017-07-20 03:33
that fixed it

2017-07-20 03:37
I was able to get around the error by uploading the iso to the tftpboot/isos dir and then running the bootenvs command

2017-07-20 03:38
was able to run the ubuntu install fully as well just had to change the dns to my router - for some reason it would not work using the digital rebar provision vm as the gateway or dns

zehicle
2017-07-20 03:38
could have to do w/ how you configured the subnets

2017-07-20 03:39
probably ya

greg
2017-07-20 03:56
Yeah - we don?t webproxy with DRP.

zehicle
2017-07-20 03:58
RE on ESXi installs.... you need to ensure min RAM / CPU requirements

zehicle
2017-07-20 03:58
but mine is still hanging on lsu_lsi install part

zehicle
2017-07-20 03:59
@jj did you get past that?

jj
2017-07-20 13:33
yeah it did when i was testing it

2017-07-22 00:35
https://t.co/fNGrXZGneF third video is not the right link ;)

2017-07-22 00:35
on the packet blog post

zehicle
2017-07-22 02:33
oops! I'll let them know

zehicle
2017-07-27 19:24
officially working on t-shirt designs.... check out http://99d.me/c/fwje

2017-07-27 21:31
hi

2017-07-27 21:32
Can't pass the following in the quickstart: TASK [Link dirs] **************************************************************************************************************************************************************** fatal: [192.168.2.183]: FAILED! => {"changed": false, "failed": true, "gid": 0, "group": "root", "mode": "0755", "msg": "refusing to convert between directory and link for /home/administrator/digitalrebar/deploy/compose/digitalrebar", "owner": "root", "path": "/home/administrator/digitalrebar/deploy/compose/digitalrebar", "size": 4096, "state": "directory", "uid": 0} to retry, use: --limit @/home/administrator/digitalrebar/deploy/digitalrebar.retry

greg
2017-07-28 00:45
not sure how you are running it, but you can remove /home/administrator/digitalrebar/deploy/compose/digitalrebar it should be a link . if it isn?t remove the directory. if it is a link, remove it. and rerun.

2017-07-28 11:28
Hmm Ill debug this further

2017-07-28 19:04
I'm trying to bring up a vagrant node and join an existing vagrant base box to test with and am having difficulty running the join_rebar.sh.

2017-07-28 19:05
when running the curl commands from the script i'm unable to connect to admin node port 3000

2017-07-28 19:06
I'm able to access the gui for digital rebar with out any issues... just need to join an existing node

greg
2017-07-28 19:10
yeah - that script is old. It wasn?t updated when we unified all API accesses through the auth proxy. I would say, you just have to modify the script to not use port 3000, but I don?t think that is all that needs to be run. Which join_rebar.sh script are you using?

2017-07-28 19:11
Yea i changed to point to https://admin.node.ip. and all works except for anything with a PUT

2017-07-28 19:11
so I can't register the node

zehicle
2017-07-28 19:11
unless you have a long term plan for Vagrant, I'd suggest just using VMs to test. Vagrant does not handle the PXE boot

2017-07-28 19:12
gotcha I was just going to run some local deployments non baremetal if possible.

zehicle
2017-07-28 19:13
I understand, I was hoping the Vagrant path would allow faster testing cycles when we did that work by avoiding PXE cycles. Turned out that it was faster to just work against cloud VMs for that use case

zehicle
2017-07-28 19:13
(we added the provider at that point)

zehicle
2017-07-28 19:14
what are you trying to validate? it's possible that Provision may be sufficient (and simpler)

2017-07-28 19:15
yes, I just didn't have a public cloud available to test with, was trying to validate on a laptop as I've been using kismatic to deploy k8s locally in hopes testing a pipeline before moving to a cloud

2017-07-28 19:16
just trying to see if this is something I can bring to our company I'm in the beg. stages of building a new datacenter

2017-07-28 19:16
and so far its exactly what I'm looking for

zehicle
2017-07-28 19:17
:slightly_smiling_face:

mniemann
2017-07-31 19:14
has joined #json

2017-07-31 22:44
@zehicle you here?

zehicle
2017-07-31 22:44
Yes

2017-07-31 22:45
The quickstart installation doesn't seem to work after a reboot of the host holding all the docker-compose containers

2017-07-31 22:45
I get this in the rev_proxy container: 2017/07/31 22:40:39 Request failed: Get https://172.17.0.12:3000/api/v2/users/rebar/digest: x509: certificate signed by unknown authority (possibly because of "x509: ECDSA verification failure" while trying to verify candidate authority certificate "internal")

2017-07-31 22:45
-- when trying to login

2017-07-31 22:46
when did you get the containers?

2017-07-31 22:47
as in when did I run the checkout?

2017-07-31 22:47
-- or run the quickstart itself?

2017-07-31 22:47
the first install - I think there was an issue w/ consul in the containers from about a week ago

2017-07-31 22:47
no this is pretty fresh

2017-07-31 22:48
ok

2017-07-31 22:48
are you trying to re-run the quickstart or just that the containers don't restart automatically after a reboot?

2017-07-31 22:49
the containers restarted after a reboot

2017-07-31 22:49
didn't rerun quickstart. I was really relieved that it finally worked, only to fined that the default didn't work.

2017-07-31 22:50
default= default credentials

2017-07-31 22:52
I really want this to work, because I want to see DR work, but I've already spent so much time on this. I've also already filed an issue on Github (bit of a rant, maybe I should edit it).

2017-07-31 22:53
Is there a good reason that DR is so incredibly resource hungry?

2017-07-31 22:58
a couple of things...

2017-07-31 23:00
1) sorry about the issues, DR has a lot of moving parts. the container packaging helps a lot but it's still complex. that's why we suggest working w/ us to build much more than a quick start

2017-07-31 23:01
2) DR has a lot of parts including a fully Ruby stack, a postgresql database, chef server and other services. so there's a lot going on there. that's why we're actively rewriting it in golang

2017-07-31 23:02
3) We're suggesting people start with DR Provision as a first stage at this point. It's much (much) lighter weight and simpler to use. No docker is required at all and the functions are easier to understand

2017-07-31 23:03
I saw DR provision, but it was the more advanced stuff I was interested in, e.g., deploying Kubernetes and Ceph, and initially provisioning through PXE in a private cluster

2017-07-31 23:03
4) Inside of DR Provision, we've been building a workflow system that simpler to use for the use cases that we hear about the most. We're also building new UX that makes it easier to manage multiple sites

2017-07-31 23:04
for those uses cases, we're putting in Ansible & Terraform integrations. Our Kubernetes work was just Kubespray anyway. Over the weekend, we added a dynamic inventory generator for Ansible with the plan to document using that as the Kubespray target

2017-07-31 23:07
that approach does not leverage DR's hybrid provider concept. While we think that function is very important long term, it has been getting in the way of people exactly like you who just want to get started quickly. For that reason, we're more excited about Provision as a fast and easy win for physical data centers.

2017-07-31 23:08
The new Provision Jobs/Tasks work (just pulled today by @galthaus ) creates a huge amount of capability to do workflows like burn-in, raid/bios, discovery & auto-classification. It's going to take a while to explain how it all works

2017-07-31 23:09
interesting stuff, eager to see how it will work out. I really think you guys are filling a gap at least somewhere in the whole hybrid-or-not cloud provisioning.

2017-07-31 23:09
TL;DR - we're strong recommending Provision as a starting point

2017-07-31 23:09
That may meet all your needs AND you can hook it into the other DR work later too

2017-07-31 23:09
Will do, but will I succeed at this point with DR Provision with the aim of priviosning/deploying ceph and kubernetes?

2017-07-31 23:10
of I course I can fill in small gaps, but is the workflow stuff ready for it?

2017-07-31 23:16
We have not integrated K8s to DR P yet. For Ceph, I'd recommend Rook (https://blog.rook.io) on top of the cluster

2017-07-31 23:17
the integration will be pretty simple "put theses profiles on the nodes you want to take on these roles, add params to the profiles to overide the defaults"

2017-07-31 23:17
ansible run w/ dynamic inventory.

2017-07-31 23:18
sadly, not as easy as the DR wizard approach; however, it's more inline with the way we see people trying to use Kubespray right now.

2017-07-31 23:19
we have some interesting plans to use DR P for joining nodes to the the cluster using the 1.7 node admission workflow.

2017-07-31 23:20
I think I like your above approach and will try it this week in a very small cluster

2017-07-31 23:21
btw, Rook seems rather inception-y :D, since it's deployed as containers, so that other containers can have persistent storage

2017-07-31 23:21
yes. So far, I've heard good things about it. Our other recent Ceph work was also containers using Helm via OpenStack.

2017-07-31 23:21
Thanks for your info, it was pretty enlightening

2017-07-31 23:22
I have to go though. Thanks again!

2017-07-31 23:23
you're welcome. We've been working hard to bring up all the DR P key functionality to make it easier to start there. We got the basics in place and are adding the gee wiz stuff now.

2017-08-03 22:19
Hi, 3 days ago, you said something about Job/Task provisioning, what did you mean by that?

greg
2017-08-03 22:20
The tree and been picking up changes for this the last few days and tip is changing quite a bit for it.

greg
2017-08-03 22:21
Tasks are a collection of templates that can be ?run? in a bootenv environment by a runner. (The runner is built into the cli.)

greg
2017-08-03 22:22
A job is a run of a task on a machine. These have dates and logs and status. They can be retried and sequenced.

greg
2017-08-03 22:22
The idea is that you can use these to do advanced templatized actions on machines during various stages of provisioning.

greg
2017-08-03 22:23
RackN is going to make content bundles of supported features with this system.

greg
2017-08-03 22:23
But that is all for now. More is going on that soon.

2017-08-03 22:25
And a different question, is the use-case of _not_ installing an OS to disk, but simply always start through PXE, supported in your design?

greg
2017-08-03 22:26
Yes - in fact, we prefer and recommend that machines always PXE so that DRP has better control over the system.

2017-08-03 22:26
say I'd want a Ceph OSD image always being run by a node, but running it from mem?

2017-08-03 22:26
Ah ok

greg
2017-08-03 22:27
I have some toy unsupported and unpublished CoreOS templates that are baiscally that. There is also a linux kit variant in the tree currently that does something similar.

2017-08-03 22:27
So in time, DRP would have to be really high-available, since it will be crucial after a calamity, when bringing machines all up

2017-08-03 22:29
Yeah, I've been wanting to check out Linux kit. As far as you know, does LinuxKit itself assume this same model of always starting afresh?

greg
2017-08-03 22:29
well - kinda. Depends upon where and what. In the case, locally installed OS, not as much. The boot order set to PXE then Harddisk handles that kind of outage. You can?t provision new, but existing system function. In the case of immutable RAM boot OS, it is more crucial and an HA form is needed. now, we have plans for how that would work and strangely enough DRP is pretty close to HA now. It is not quite there yet.

greg
2017-08-03 22:30
LinuxKit in general assumes an immutable OS layer with docker running in that layer and containers on top.

greg
2017-08-03 22:30
It is in fact that docker OS immutable layer.

greg
2017-08-03 22:31
It could be installed, but appears to be more targeted at CoreOS style downloadable OS.

2017-08-03 22:32
But furthermore, you're telling me to wait a bit for all the new functionality which is in the pipeline :) ?

greg
2017-08-03 22:32
You could, but I think you can do a lot with what is there.

2017-08-03 22:33
Ow, and one last thing, something which I couldn't find in the docs, what's the difference between the default and the unknown bootenv?

greg
2017-08-03 22:35
default bootenv is the boot environment that machine gets when it is create without a bootenv specified.

greg
2017-08-03 22:35
unknown bootenv is the boot environment handed out by the DHCP server for a machine that isn?t know.

greg
2017-08-03 22:36
unknown bootenv populates the default.ipxe, default elilo, and default lpxelinux file. It is the fail through it nothing else is specified.

greg
2017-08-03 22:37
It is what is known as an UnknownOnly boot env. Only unknown machines will boot from it (usually, everything as nuances).

greg
2017-08-03 22:37
Examples of these are ignore and discovery.

2017-08-03 22:37
Ok, I get the last one. When would the default one kick in when using the webUI, because if I create a machine manually, I _have_ to choose a bootenv anyway.

greg
2017-08-03 22:38
The default bootenv is what the machine?s bootenv becomes whtn the machine is created.

greg
2017-08-03 22:38
An example of this sledgehammer (or any other non-UnknownOnly bootenv).

greg
2017-08-03 22:39
If you create a machine and specify a boot env (like ubuntu-16.04-install). that will be the machines boot env. If the machine boots, it will get directed to that boot env and ubuntu will be installed.

greg
2017-08-03 22:43
If you don?t create the machine and don?t specify a bootenv in the object, it will get the default. We usually set it to sledgehammer so that the machine can continue to be discovered dynamically.

2017-08-03 22:43
So: 1. discovery (assuming i had the unknown bootenv set to sledgehammer for example). 2. Once discovered it will assign it the default bootenv?

2017-08-03 22:43
ehh the second 1. should've been 2.

greg
2017-08-03 22:43
unknown is set to discovery.

greg
2017-08-03 22:43
then it will set the node to sledgehammer (it is a bit pedantic about that).

greg
2017-08-03 22:44
We could have left it blank and let the default take over.

greg
2017-08-03 22:44
Actually, just a second..

greg
2017-08-03 22:45
Yeah - I remembered ?rightly?. the discovery template control.sh creates the machine with sledgehammer.

greg
2017-08-03 22:45
We didn?t want to force install things without user intervention by accident.

2017-08-03 22:46
Sorry, but then I still don't understand at which step the default bootenv is set, and/or used.

greg
2017-08-03 22:47
only when the machine is manually created by the user without a bootenv in the specified JSON.

2017-08-03 22:48
Aah, so not through webUI

2017-08-03 22:48
in the webUI you're forced to choose a env

greg
2017-08-03 22:48
well - the UI knows to put something.

2017-08-03 22:53
THen the super last thing, how much mem does starting sledgehammer take? i'm little constrained mem wise in my small POC setup. I can only give 768M to a test VM which does PXE. Though it crashes with a low-mem warning

greg
2017-08-03 22:54
Yeah - that is probably to small. 1.5 G or so. Sledgehammer is ?bloated? with tools and environment pieces to identify components on real servers. So, it is a little heavier that what you are trying on.

greg
2017-08-03 22:54
Sorry

2017-08-03 22:54
That's okay, was just wondering

2017-08-03 22:54
Thanks, so far again. Bedtime for me now

greg
2017-08-03 22:55
Later

zehicle
2017-08-03 23:15
listening to your vm

zehicle
2017-08-03 23:18
no need to call - thanks for the detailed voice mial

lae
2017-08-04 00:20
does sledgehammer not have serial console support?

greg
2017-08-04 00:26
it does, but you need to tell it which one.

greg
2017-08-04 00:27
Or really, it doesn?t, but it will configurable. It depends upon which version of templates you have.

greg
2017-08-04 00:27
you need to console=? on the bootparams line. The problem is that varies depending upon your environment. Sooo, you need to add it in your bootenvs.

lae
2017-08-04 00:27
just did a fresh install of 3.0.5 and tried adding console=ttyS1,115200n8 to sledgehammer, but nothing shows up :<

lae
2017-08-04 00:28
(and bootup stopped showing on tty0)

greg
2017-08-04 00:28
okay - you may need to add it to discovery as well.

lae
2017-08-04 00:28
yeah, I made the changes to discovery, too, though once the machine's been discovered it just goes straight to sledgehammer I assume

greg
2017-08-04 00:28
yes

greg
2017-08-04 00:29
unless in packet?.

lae
2017-08-04 00:29
this is on-premise

greg
2017-08-04 00:29
should be okay then. is it just sledgehammer console?

lae
2017-08-04 00:30
the existing debian install on the system I'm testing it on outputs to serial console fine, if that's what you're asking

greg
2017-08-04 00:31
well - I?ve been playing with it and it is strange. I haven?t found a set of options that work consistently everywhere.

greg
2017-08-04 00:32
I have a coming change that has this as bootparam in sledgehammer and discovery.

greg
2017-08-04 00:32
```rootflags=loop root=live:/sledgehammer.iso rootfstype=auto ro liveimg rd_NO_LUKS rd_NO_MD rd_NO_DM provisioner.web={{.ProvisionerURL}} rs.uuid={{.Machine.UUID}} rs.api={{.ApiURL}} -- {{ if .ParamExists \"kernel_console\"}}{{.Param \"kernel_console\" }}{{ end }}```

greg
2017-08-04 00:33
After the ? seems to be required and then console=ttyS1,115200n8 or console=ttyS0,115200n8 seems to work.

greg
2017-08-04 00:33
now bios settings can mess with this, though if you have an ubuntu install that works. console line from that should work.

greg
2017-08-04 00:34
For ubuntu I had to put the console string on both sides of the --

lae
2017-08-04 00:34
will try in a bit

lae
2017-08-04 01:06
no cigar

lae
2017-08-04 01:12
hm

lae
2017-08-04 01:15
oh, this machine's actually ttyS2 - never mind then, it works. (looks like I was looking at the wrong item in my inventory whose serial-over-LAN console is ttyS1, so I got confused)

greg
2017-08-04 01:15
whew

2017-08-04 03:30
Working on t-shirt design - please feel free to vote https://99designs.com/contests/poll/j4amxx

2017-08-04 12:15
@zehicle , I am getting the below error when I tried to upload the iso image using rebar script [root@test10 ~]# ./rebar provisioner isos upload CentOS-6.8-x86_64-bin-DVD1.iso as CentOS-6.8-x86_64-bin-DVD1.iso 2017/08/04 09:45:41 ID not set panic: ID not set goroutine 1 [running]: log.Panic(0xc4200dfa48, 0x1, 0x1) /usr/lib/go/src/log/log.go:322 +0xc0 github.com/digitalrebar/digitalrebar/go/rebar-api/api.(_Client).UrlTo(0xc420054be0, 0xa6c620, 0xc420140140, 0x0, 0x0, 0x0, 0xa64900, 0xc42014c300) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/rebar-api/api/client.go:87 +0xb4 github.com/digitalrebar/digitalrebar/go/rebar-api/api.(_Client).Read(0xc420054be0, 0xa6c620, 0xc420140140, 0x0, 0x0) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/rebar-api/api/types.go:132 +0x6a github.com/digitalrebar/digitalrebar/go/rebar-api/api.(_Client).Fetch(0xc420054be0, 0xa6c620, 0xc420140140, 0x0, 0x0, 0x0, 0x24) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/rebar-api/api/types.go:145 +0x9c github.com/digitalrebar/digitalrebar/go/rebar-api/api.Session(0x88068a, 0x16, 0x0, 0x0, 0x0, 0x0, 0x73da86, 0xc4200fbcb0, 0xc42011d3d0) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/rebar-api/api/client.go:184 +0x34d main.main.func1(0xc420138d80, 0xc42011d500, 0x3, 0x3) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/rebar-api/rebar/main.go:282 +0x2fb github.com/digitalrebar/digitalrebar/go/vendor/github.com/spf13/cobra.(_Command).execute(0xc420138d80, 0xc42011d3b0, 0x3, 0x3, 0xc420138d80, 0xc42011d3b0) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/vendor/github.com/spf13/cobra/command.go:619 +0x4d5 github.com/digitalrebar/digitalrebar/go/vendor/github.com/spf13/cobra.(_Command).ExecuteC(0xa941c0, 0xc4200dff60, 0xa94301, 0xc4201438c0) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/vendor/github.com/spf13/cobra/command.go:722 +0x339 github.com/digitalrebar/digitalrebar/go/vendor/github.com/spf13/cobra.(_Command).Execute(0xa941c0, 0xc4200dff58, 0x1) /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/vendor/github.com/spf13/cobra/command.go:681 +0x2b main.main() /home/victor/gocode/src/github.com/digitalrebar/digitalrebar/go/rebar-api/rebar/main.go:353 +0x35d [root@test10 ~]#

greg
2017-08-04 12:40
Make sure that you have REBAR_KEY and REBAR_ENDPOIINT to valid value.

greg
2017-08-04 12:40
or pass -E and -U and -P on the cli

2017-08-04 12:47
@deepuashokan85 if you are focused on provisioning, we strongly recommend starting with the DR Provisioner (https://github.com/digitalrebar/provision) instead of the full Rebar stack. Provision is specifically designed for the DHCP/PXE automation phases and can be integrated back into Rebar.

lae
2017-08-05 01:10
congrats on the 1000th commit for provision btw

greg
2017-08-05 02:33
we?ve been flying. :slightly_smiling_face:

zehicle
2017-08-05 03:24
!! Wow. That's a big millstone

greg
2017-08-05 04:05
Warning - swagger committed a change today that broke our swagger generation. I?m looking at how to fix it now.

greg
2017-08-05 04:05
This effects people trying to build tip.

greg
2017-08-05 16:46
back working

lae
2017-08-07 20:31
does provision just extract the `IsoFile`?

lae
2017-08-07 20:32
I have some ramdisks and kernels for some in house operating systems that I need to create bootenvs for - I'm guessing I just need to tar them up and define Initrd and Kernel correctly

greg
2017-08-07 20:33
Yes - it explodes it in the tftpboot directory so it can be served.

greg
2017-08-07 20:33
Yes. tars are valid ?isos? - look at the sledgehammer tarball

lae
2017-08-07 20:33
yeah, that's where I got my assumption from. thanks

lae
2017-08-09 21:32
How can I configure DRP so that machines boot locally by default?

lae
2017-08-09 21:32
``` $ drpcli prefs set unknownBootEnv local Error: BootEnv local cannot be used for the unknownBootEnv ```

lae
2017-08-09 21:33
and setting defaultBootEnv to local doesn't seem to actually set it to be local

greg
2017-08-09 21:33
local => ignore

greg
2017-08-09 21:33
local is the machine-specific bootenv

greg
2017-08-09 21:33
ignore is the discovery bootenv

lae
2017-08-09 21:34
ah

lae
2017-08-09 21:34
Thanks

2017-08-10 00:31
Greetings folks! I discovered that Canonical broke the quickstart process by releasing 16.04.3, and opened a bug and pull request to (hopefully) address. Let me know if I need to sign paperwork or whatnot.

greg
2017-08-10 01:17
Thanks. I'll pull it shortly just so it is in the tree

greg
2017-08-10 01:19
I have a set of changes that are coming that move content in to a separate repo to handle changes like this cleaner. I have fixed it there already. That report has automated tests to catch this errro. And the like. We are trying to get ahead of these issues.

greg
2017-08-10 01:39
@edolnx - my comments are for DRP. I?ve pulled it into DR.

wdennis
2017-08-10 16:43
Hi @greg - updated to latest stable this morning (v3.0.5) and I'm experiencing problems with both UI and drpcli

wdennis
2017-08-10 16:44
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F6MRS9R0E/image_uploaded_from_ios.jpg and commented: Here is what I see from drpcli:

lae
2017-08-10 16:44
ah

lae
2017-08-10 16:44
moving assets to a separate repo?

lae
2017-08-10 16:45
(I had forked it to our local git repo so I could modify assets/)

wdennis
2017-08-10 16:45
UI just doesn't connect in browser...


greg
2017-08-10 16:46
list -> show

wdennis
2017-08-10 16:47
Aha, that works...

greg
2017-08-10 16:47
@lae - yeah - we are having a hard time supporting everything and updating everything. So we are going to split them into separate repos with more version control stuff and layering.

wdennis
2017-08-10 16:47
Should wrong keyword cause an abort tho?

greg
2017-08-10 16:47
no

greg
2017-08-10 16:47
:slightly_smiling_face:

wdennis
2017-08-10 16:48
OK, QA hat on here...

wdennis
2017-08-10 16:49
If I type 'drpcli bootenvs bogus ...' I do get a Usage: reply

greg
2017-08-10 16:49
yeah - list may be doing something ?strange?. I have cli test for most of it.

wdennis
2017-08-10 16:50
'drpcli bootenvs list ...' causes the panic

greg
2017-08-10 16:50
drpcli bootenv list

greg
2017-08-10 16:50
works?

greg
2017-08-10 16:50
I know what is going on. I?ll fix it and add a test.

greg
2017-08-10 16:50
The list command lets you do index filtering.

wdennis
2017-08-10 16:50
Yes, works

greg
2017-08-10 16:51
```drpcli bootenvs list Available=true```

greg
2017-08-10 16:51
I don?t test if you don?t send an equals.

wdennis
2017-08-10 16:51
OK, so now about the UI...

greg
2017-08-10 16:51
yeah

wdennis
2017-08-10 16:52
I do see TCP listeners on port 8091 and 8092

greg
2017-08-10 16:52
https://<ip>:8092

wdennis
2017-08-10 16:52
But if I try that URL, no go...

greg
2017-08-10 16:53
that one is strange

greg
2017-08-10 16:54
that address is part of that host? ```ip a``` shows it?

wdennis
2017-08-10 16:55
Yes, confirmed using correct IP

greg
2017-08-10 16:55
cool - okay - firewall rules?


greg
2017-08-10 16:57
iptables getting in the way?

wdennis
2017-08-10 16:58
Maybe that's it - had full DR running on this node prev, Docker uses iptables-- when I did a 'iptables-L' I see 1,000 rules :)


wdennis
2017-08-10 16:59
You know the command to del all chains & flush again?

wdennis
2017-08-10 17:00
I have also removed docker-engine* pkgs now

greg
2017-08-10 17:02
iptables -F

greg
2017-08-10 17:02
is what I do

greg
2017-08-10 17:02
I think.

wdennis
2017-08-10 17:03
N/m, sorted it - combo of -F and -X flags

wdennis
2017-08-10 17:04
And now I have a UI :)

greg
2017-08-10 17:04
Yeah

wdennis
2017-08-10 17:04
(And found a bug)

greg
2017-08-10 17:04
Yep - I?ll add it to more list.

wdennis
2017-08-10 17:04
Oh, tabs on UI now - slick

greg
2017-08-10 17:05
Which one did you get?

greg
2017-08-10 17:05
it is undergoing a lot changes and will be different. More on that later.

wdennis
2017-08-10 17:05
3.0.5 (stable)

greg
2017-08-10 17:05
yeah, but it my mind that doesn?t help. :slightly_smiling_face:

lae
2017-08-10 17:06
from april

wdennis
2017-08-10 17:06
Well, it's better than "one long page with everything" design :)

lae
2017-08-10 17:07
oh yeah, would ya suggest to run tip

wdennis
2017-08-10 17:08
I'm actually using it in production so rather be on 'stable'

greg
2017-08-10 17:08
stable is fine.

greg
2017-08-10 17:09
tip is usually stable (the unit tests are good for the core pieces). The templates and such not as much. Hence, some of the changes coming.

greg
2017-08-10 17:10
Tip is passing unit tests, and has one cavaet on upgrade. subnets can be enabled/disabled now without deleting them. The data migration makes it off by default. :disappointed:

lae
2017-08-10 17:10
I'm running my own version of the templates - unless you mean handling of the templates has been changed a bit

greg
2017-08-10 17:10
templates are good - the actual values in templates are the issue.

greg
2017-08-10 17:11
We are working on a content layering system that hopefully will be simple but let you have your own content versions that we can still update. Almost there.


wdennis
2017-08-10 17:15
Still getting this when I click "API Help" links in UI tho

greg
2017-08-10 17:17
okay - bug - that swagger ui isn?t working again.

lae
2017-08-10 17:19
you'll need to pass in the current IP, @wdennis

lae
2017-08-10 17:19
that's a CORS issue

greg
2017-08-10 17:20
oh - yeah - the IP in the browser URL and the IP in the form need to be the same.

greg
2017-08-10 17:20
That is the real bug to fix.

wdennis
2017-08-10 17:22
Ah

zehicle
2017-08-10 23:53
@wdennis if you can give me a little summary of the iptables fix, I'll make sure it gets into the FAQ

lae
2017-08-11 00:50
@greg curious, what does the ce- prefix stand for on the templates in provision-content?

lae
2017-08-11 01:02
also I'm not really a fan of the inconsistent syntax within DRP/provision-content - like strings quoted in sledgehammer.yml but not in other bootenvs, and some variables using hyphens while others use underscores. minor issue, but it makes it confusing to figure out what style to stick to

greg
2017-08-11 01:10
Yeah. I'm on phone. Will need to type longer answer

wdennis
2017-08-11 13:29
@zehicle - Basically it's: iptables -F iptables -X iptables -t nat -F iptables -t nat -X iptables -t mangle -F iptables -t mangle -X (-F flushes the chain rules, -X removes all but default chains)

greg
2017-08-12 21:31
@lae ce- stands for community edition

greg
2017-08-12 21:33
With regard to the quotes and not, the underscores and hyphens, some is laziness, some is ruby-isms.

greg
2017-08-12 21:36
The original components operated in a dynamic ruby environment and hypen and underscores had to with class separators. It is not an issue now. These are independent from the DR content now.

greg
2017-08-12 21:37
A clean up pass and consistency pass should be done.

lae
2017-08-13 06:30
Ah I see

lae
2017-08-13 06:32
@greg I can submit a PR if you'd like - I did one pass regarding this for my company's copy already

greg
2017-08-13 15:06
okay - that would be great.

2017-08-14 18:12
Hi

zehicle
2017-08-14 18:13
hello

2017-08-14 18:15
Need a help.

2017-08-14 18:15
I am trying to install digital rebar

2017-08-14 18:15
and getting error as

2017-08-14 18:15
TASK [gem install kvm slaves] **************************************************************************************************************** ...................fatal: [149.56.217.200]: FAILED! => {"changed": true, "cmd": ["gem", "install", "json", "net-http-digest_auth"], "delta": "0:00:00.111974", "end": "2017-08-14 21:46:42.611851", "failed": true, "rc": 1, "start": "2017-08-14 21:46:42.499877", "stderr": "ERROR: Loading command: install (LoadError)\n cannot load such file -- zlib\nERROR: While executing gem ... (NameError)\n uninitialized constant Gem::Commands::InstallCommand", "stderr_lines": ["ERROR: Loading command: install (LoadError)", " cannot load such file -- zlib", "ERROR: While executing gem ... (NameError)", " uninitialized constant Gem::Commands::InstallCommand"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/root/digitalrebar/deploy/digitalrebar.retry

2017-08-14 18:16
what would be the issue?

2017-08-14 18:22
Any help will be appreciated

2017-08-14 23:06
@ukris7_twitter could you provide more information about your environment and how you started to run the script?

2017-08-14 23:07
Also, we are STRONGLY recommending people start with Digital Rebar Provision in all use cases. This is much easier to get running initially and will help validate your environment before you bring up the more complex dockerized Digital Rebar environment.

lae
2017-08-15 16:20
@greg reading/editing BootParams is kind of a pain with the current multiline strings, what do you think about having each param on a newline?

lae
2017-08-15 16:20
e.g ``` BootParams: ' priority=critical console-tools/archs=at console-setup/charmap=UTF-8 console-keymaps-at/keymap=us popularity-contest/participate=false passwd/root-login=false keyboard-configuration/xkb-keymap=us netcfg/get_domain=unassigned-domain console-setup/ask_detect=false debian-installer/locale=en_US.utf8 console-setup/layoutcode=us keyboard-configuration/layoutcode=us netcfg/dhcp_timeout=120 netcfg/choose_interface=auto url={{.Machine.Url}}/seed netcfg/get_hostname={{.Machine.Name}} root=/dev/ram rw quiet -- {{if .ParamExists "kernel-console"}}{{.Param "kernel-console"}}{{end}} ' ```

greg
2017-08-15 16:49
I?m fine with it as long as it posts in. I mean need to change processing to remove the newlines. We may do that today, but I?d have to look

lae
2017-08-15 16:57
I don't think you need to make any changes to provision (I assume you meant s/mean/may/ there) - you already have newlines in some existing profiles in provision and it works fine - I also just tested this particular change and the pxelinux.cfg append line gets generated correctly

lae
2017-08-15 17:00
In other news, I opened https://github.com/digitalrebar/provision-content/pull/14 regarding style when you get a chance

greg
2017-08-15 17:03
okay - cool - I?lll look at it. :slightly_smiling_face:

2017-08-15 20:38
Hi

zehicle
2017-08-15 20:40
@lae are you interested in also adding style recommendations to the docs to match? that would help codify your changes

2017-08-15 20:40
@sherinsha hello

2017-08-15 20:40
Hi Rob

2017-08-15 20:41
TASK [Get Docker] ********************************************************************************************************** fatal: [149.56.217.200]: FAILED! => {"changed": false, "dest": "/tmp/docker.sh", "failed": true, "msg": "Request failed: <urlopen error [Errno 1] _ssl.c:492: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>", "state": "absent", "url": "https://get.docker.com/"} to retry, use: --limit @/root/digitalrebar/deploy/digitalrebar.retry PLAY RECAP ***************************************************************************************************************** 149.56.217.200 : ok=9 changed=3 unreachable=0 failed=1

2017-08-15 20:41
I stuck at this stage while doing DigitalRebar installation

2017-08-15 20:43
anyone experienced this before

lae
2017-08-15 20:43
@zehicle sure, how do you want me to go about that? `docs/` folder with a `style.md` or just a quick `README`?

lae
2017-08-15 20:43
or a contributions doc

zehicle
2017-08-15 20:50
we need a doc/dev directory - I'll see about consolidating that. in the meantime, could you add doc/dev-style.rst?

zehicle
2017-08-15 20:51
the docs are autogenerated by read-the-docs, so all rst

2017-08-15 22:56
@sherinsha try installing Docker first, then run the script. something about your Docker install is failing

2017-08-15 22:57
okay

2017-08-15 22:57
is there any way to reset the rebar user password via SSH ?

2017-08-15 22:58
you can use the rebar CLI to do that on the system or remote

2017-08-15 22:59
do u remember the exact command for the same

2017-08-15 23:44
Thanks. Manged to reset the user password

2017-08-16 01:14
Digital Rebar install done. How can I deploy a bare metal server though it. All online documentations are confusing. Is there any simple step by step documentation?

zehicle
2017-08-16 02:00
you need to configure your admin network to be on the right NIC and have a host dhcp range. There are some videos for that.

2017-08-16 02:01
but couldnt find a straight forward documentation in any

zehicle
2017-08-16 02:02
HOWEVER, if your primary goal is to provision metal machines, we strongly recommend starting with Provision

2017-08-16 02:04
cd ~/digitalrebar/deploy workloads/add-provider.sh --admin-ip=IP --provider=Baremetal

2017-08-16 02:04
is this to create provider ?

zehicle
2017-08-16 02:05
you don't have to create the baremetal provider - it's there by default if you installed with con-provisioner & con-dhcp

lae
2017-08-16 02:05
Made an Arch Linux package for drpcli https://aur.archlinux.org/packages/drpcli

2017-08-16 02:06
@sherinsha really... start with Digital Rebar Provision. https://github.com/digitalrebar/provision

2017-08-16 02:06
okay

2017-08-16 02:07
it only takes 5 minutes to get running and will do all the metal provisioning much more easily. What use case are you trying to accomplish?

2017-08-16 02:07
okay ty

2017-08-16 02:08
once you have that working, it's possible to feed into Digital Rebar if you need features from that orchestration

2017-08-16 02:08
okay

2017-08-16 02:09
however, with v3.1 DRP coming out very soon (look at the master branch on tip) most of the primary workflows can be done in DRP

2017-08-16 02:09
cool

zehicle
2017-08-16 02:10
@lae sweet! thanks

lae
2017-08-16 02:14
Do you think we could get tarball releases for DRP? zip seems kind of unconventional

lae
2017-08-16 02:14
(in consideration for 3.1)

greg
2017-08-16 05:15
We could, but zip works on all the platforms and tar / bsdtar uncompresses them and is required to be installed.

2017-08-16 11:50
hello to everyone!

2017-08-16 11:52
could anyone please help me. I am testing digitalrebar with kvm virtual mcahines. When I boot a VM from network I get stuck on mounting stage2.img with the following error failed to mount xxxxx.squashfs as Stage2 initramfs

2017-08-16 11:52
and then i drop out into shell

greg
2017-08-16 13:52
Are you using DRP or DR?

greg
2017-08-16 13:54
Actually, for either case, make sure that the VM can reach the admin node. Check how many nics you are booting and make sure that you are getting the nic you expect. For example, vagrant adds an extra nic and that gets confusing.

2017-08-16 15:10
Sorry, redeployment of digital rear nodes Olvera the problem. It created a file stage2.img if a different size somehow and now everything works. Thank you for response.

2017-08-16 15:11
Sorry, redeployment of digital rebar nodes solved the problem.

2017-08-16 15:16
great! Glad to know that it's working

shane
2017-08-16 15:34
has joined #json

vishwanathj
2017-08-16 16:35
has joined #json

greg
2017-08-16 16:52
@lae - Your pull request worked in unit tess and are in. :slightly_smiling_face: Thanks! Now, to get DRP to use that repo. :slightly_smiling_face:

lae
2017-08-16 17:22
yay

lae
2017-08-16 17:26
@greg so - I noticed that in the bootenvs for Debian and Ubuntu in digitalrebar's assets (which carried over into DRP's ubuntu bootenv) set OS.Family to debian and ubuntu, respectively - to me this is kind of confusing since I think of them as both in the debian family (similar to RHEL/Centos/Fedora all being of the redhat family) - ansible sees it this way and uses a separate variable, `ansible_distribution` to distinguish the different distros in the same family. I also note that in `tools/install.sh` the same semantics that ansible uses are also used https://github.com/digitalrebar/provision-content/blob/master/tools/install.sh#L93

lae
2017-08-16 17:27
Does DRP use `OS.Family` itself or do only the templates use it?

greg
2017-08-16 17:28
It is mostly info. I don?t think it is used anywhere. It could be expanded in templates.

lae
2017-08-16 17:29
Ah okay - so yeah, I was thinking of expanding usage of OS.Family into templates - for example making it simple to have different partitioning templates while retaining the primary preseed/kickstart files - and that the partitioning templates would be identified by their layout rather than being a preseed or kickstart template - and within those partitioning templates we test for OS.Family

lae
2017-08-16 17:30
also actually, it is used in the preseed template right now to test whether or not we're installing Debian or Ubuntu

greg
2017-08-16 17:36
okay - I see what you are going to do, I think.

greg
2017-08-16 17:42
We haven?t tested this , but you could also do somethingl ike this:

greg
2017-08-16 17:43
{{$partlayout := .Param ?machine-part-config?}} {{template $partlayout .}}

greg
2017-08-16 17:43
where machine-part-config would be a string parameter that contains the name of a template that represents the partitioning layout you want.

lae
2017-08-16 17:44
I was actually in the midst of typing something of a similar fashion out but got distracted by work

lae
2017-08-16 17:44
but basically yeah

lae
2017-08-16 17:44
and then have a default part-config

greg
2017-08-16 17:45
This is the one area where the job/task system doesn?t help. The kickstart / preseed tweaks.

greg
2017-08-16 17:46
This would be a good addition to the community templates.

lae
2017-08-16 17:47
(this is actually something I used to do with Cobbler previously, ha - although I had separate centos/debian templates)

greg
2017-08-16 17:52
yeah - we wanted to make sure something like this was functional to make it easier to move over from cobbler.

2017-08-16 18:37
Is there a way to join already existing VM Linux "Node" thats riding on vmware?

greg
2017-08-16 19:04
to which system? DRP or DR. Yes. kinda.

2017-08-16 19:04
dr is what I installed

greg
2017-08-16 19:07
The quick short answer is not really in a run this command form.

2017-08-16 19:08
can use pxe instead?

greg
2017-08-16 19:11
So - one option would be to pxe boot the machine to sledgehammer. That will create the machine in DR. copy off the authorized_keys file in the root directory. Then put that in the root directory of your machine after you reboot back to its OS. Then you can mark the machine available and it will be operated against. That is kinda slow.

greg
2017-08-16 19:14
There is also the digitalrebar/deploy/scripts/join_rebar.sh - but it is old. It does mostly what needs to be done.

2017-08-16 19:14
yea I tried tweaking it a bit, the api has changed It looks like, I was close but wanted to find out another way,

2017-08-16 19:15
I'll try and pxe the vm and see if that works

2017-08-16 20:36
@hornjason what are you trying to make happen?

2017-08-16 20:37
im trying to spin up vm's inside vmware...to do a POC ... instead of metal. thats all

2017-08-16 20:37
then run install k8s/openstack

2017-08-16 20:38
that install has decayed - will not work out of the box at this point

2017-08-16 20:38
sorry to tell you

2017-08-16 20:39
np, at least I know now, I noticed you had a video using osx/virtual box. is this a usable option at this time

2017-08-16 20:39
for testing provisioning, yes.

2017-08-16 20:40
right ... let me give that a try and play with join_rebar see what I can do

2017-08-16 20:41
we're about to drop v3.1 of DR Provision - that will include a simple dynamic inventory that can install k8s via kubespray

2017-08-16 20:42
I'll upload the doc into master for DR Provision and post the link

2017-08-16 20:49
https://github.com/digitalrebar/provision/blob/master/doc/integrations/ansible.rst

2017-08-16 20:50
excellent, i just registered a already running vm like join_rebar.sh does. hopefully this will work as well

2017-08-16 20:50
thanks rob

2017-08-16 20:52
ahh so this will add a provisioner so you can deploy from rebar

greg
2017-08-18 18:48
: Hi all. We are in the process of changing how model validation and a few other things work internally. The master of the tree will have some issues for a little while. tip is works. stable is unchanged. This should be back shortly.

monkey
2017-08-18 18:52
has joined #json

2017-08-21 13:48
Hi all, I?m playing with Digital Rebar for the first time, it looks like an awesome tool! I?ve watched a few videos and now I?m following the guide http://provision.readthedocs.io/en/stable/doc/os-support/linuxkit.html I?m having an issue right out of the gate though, the first think to run is `curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/master/tools/install.sh | bash -s -- --isolated --rs-version=tip install` which gives me back `sudo ./dr-provision --static-ip=10.10.1.10 --file-root=/home/since/rebar-digital/lk-dr-trial/dr-provision/drp-data/tftpboot --data-root=drp-data/digitalrebar --local-store="" --default-store="" &` but when I run that I get back `unknown flag `local-store?` :worried:

greg
2017-08-21 14:10
Oh wait - just fixed that last night or Saturday night. I think. Should be local-content and default-content. May need to blow away dir and start over. Sorry. Tip has been in flux the last few days. Should be back together.

greg
2017-08-21 14:11
Of course if you tried the install stable and got this message that is a bug.

2017-08-21 14:16
@zehicle I just started clean, thought I?d try the most basic thing first, so I went to http://provision.readthedocs.io/en/stable/doc/quickstart.html and ran the first command shown `curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/master/tools/install.sh | bash -s -- --isolated install` whcih gave me back `sudo ./dr-provision --static-ip=10.10.1.10 --file-root=/home/since/drp-data/tftpboot --data-root=drp-data/digitalrebar --local-store="" --default-store="" &` and if I run that then I also get `unknown flag `local-store'`

2017-08-21 14:18
I?m guessing the version on the "Quick Start? is stable, so is this a bug?

zehicle
2017-08-21 14:18
we've changed some flags in tip

zehicle
2017-08-21 14:18
so the quick start script is straddling

zehicle
2017-08-21 14:19
the simplest solution is to `./dr-provision --help` to make sure the flags are right

2017-08-21 14:25
Thanks, looks like those flags have gone.

zehicle
2017-08-21 14:32
could you check ./dr-provision --version?

zehicle
2017-08-21 14:34
the quckstart should be using 3.0 by default. we're just about to release 3.1 from tip to stable, so there may be some accidental leakage.

greg
2017-08-21 14:40
It is a bug. Too aggressive on my flag littering.

greg
2017-08-21 14:40
Just don?t use local-store/local-content and default-store/default-content on 3.0 systems.

greg
2017-08-21 14:41
It is a bug in the install script.

greg
2017-08-21 14:42
Actually, instead of me fixing it. The stable tree and docs should use: ```curl -fsSl https://raw.githubusercontent.com/digitalrebar/provision/stable/tools/install.sh```

greg
2017-08-21 14:43

greg
2017-08-21 14:44
Hopefully, that will give you enough to deal with the issue:

greg
2017-08-21 14:44
Three things should work:

greg
2017-08-21 14:44
1. change initial curl to stable from master.

greg
2017-08-21 14:44
2. Change the initial comment to add ```--drp-version=tip``` - this would get the tip which is functionally equivalent.

greg
2017-08-21 14:45
to stable.

zehicle
2017-08-21 14:45
@greg should I update the quickstart to show stable & tip install choices?

greg
2017-08-21 14:45
Yeah.

greg
2017-08-21 14:45
We shouldn?t have the install script cross the ?streams? as it were.

greg
2017-08-21 14:46
we tried and it works for point releases ,but will get hairy as we go forward.

greg
2017-08-21 14:46
Since we can access the trees, it is better to have a consistent command line.

greg
2017-08-21 14:46
This is stable install: ```curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/stable/tools/install.sh | bash -s -- --isolated install```

greg
2017-08-21 14:47
This is tip install: ```curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/tip/tools/install.sh | bash -s -- --isolated install --drp-version=tip```

greg
2017-08-21 14:48
We should avoid master because it isn?t necessarily completely vetted like a tip is. I move tip slightly independently of master. This way straddling features and PRs can be brought together into tip.

zehicle
2017-08-21 14:50
makes sense. I like people make a specific choice and being clear about it

greg
2017-08-21 14:51
and then dropping --isolated install makes it ?production?

greg
2017-08-21 14:52
Also I didn?t want to fix it because the install.sh script now has different methods for getting content.

greg
2017-08-21 14:52
Tip uses the validated community content repo. stable pulls from assets.

2017-08-21 14:59
@zehicle Thank you! I thought I was being stupid, I?ll work thought it like you said.

greg
2017-08-21 15:08
We?ve got to get a better bot. :slightly_smiling_face: Or get everyone into slack. @Simon-Ince - you?ve been talking to two of us. :slightly_smiling_face:

2017-08-21 15:11
Thank you everybody!

greg
2017-08-21 15:13
More for awareness than praise. @zehicle and @greg will get peoples attention.

2017-08-21 15:30
@Simon-Ince happy to talk 1x1 and also pull you over to the Slack channel too

2017-08-22 14:53
Hi there, can anyone help me with a weird issue we are having running dr-provision. It looks like it's only binding to IPv6 ports, and leaving the v4 address alone.


2017-08-22 14:54
I'm getting something that looks like ^^

2017-08-22 14:56
can you provide the start-up parameters and the type of system you are running on? you can use ``./drpcli info get`` to recover the info

2017-08-22 14:57
it would be helpful to know if you specified a static ip too

2017-08-22 14:58
Thanks @zehicle , The startup command I'm running is `./dr-provision --static-ip=10.10.1.10 --file-root=/home/dave/digital-rebar/drp-data/tftpboot --data-root=drp-data/digitalrebar`

2017-08-22 14:59
Nothing fancy, just want it to bind to ip4 address and access the UI

greg
2017-08-22 14:59
That looks normal. The kernel will bind to both by default And it looks like that

2017-08-22 15:00
I thought `8091` and `8092` were just on v6 addresses?

2017-08-22 15:01
(Unless I'm reading this completely wrong, which is quite possible)

2017-08-22 15:36
@zehicle All sorted.. gumble grumble firewall.. Thanks for your help anyway :-)

2017-08-22 15:37
if you can share the troubleshooting steps, I'll put it into the docs

greg
2017-08-22 15:52
:::8091 is unspecified IP (both). Great that you?ve worked it out.

2017-08-22 17:04
Hey Folks - I'm having a heck of a time getting DR up and running. Tried the quickstart but DHCP provisioning wasn't enabled and trying to enable it would fail with ansible errors. Tried to follow the docs, and it all seems to start but I cannot login to the control panel. I feel like I am missing something obvious.

zehicle
2017-08-22 17:06
Dr or Dr provision?

zehicle
2017-08-22 17:08
If you are using ansible, then DRv2. Check out the DR P install. Its much lighter.

zehicle
2017-08-22 17:10
For DR v2, check the docker log, it may not be all the way started if you cannot login

2017-08-22 17:12
Are there docs for DR P?


2017-08-22 17:16
Thanks!

zehicle
2017-08-22 17:16
for boot provision, the provision is much simpler, smaller and easier to understand

zehicle
2017-08-22 17:17
with some of the new additions (tasks, plugins, content) , we think it covers all of the key use cases

2017-08-22 17:19
:+1:

2017-08-22 17:28
setting up a lab cluster of 10 Intel NUCs... doesn't seem to be able to get through pxe with `dr-provision: dr-provision2017/08/22 17:16:14.566078 sending block 0: code=0, error: TFTP Aborted`

2017-08-22 17:29
trying to debug it, but I can't seem to change the preferences in the UI

2017-08-22 17:30
two questions... where are the logs stored and how do I persist the debug levels in preferences?

2017-08-22 17:31
running tip v3.0.4-tip-235-9f07bd96cefa80f66b9d10062f50f1048a7cf8ff

2017-08-22 17:43
we're working on UX - CLI is safer right now to set things, especially for tip

2017-08-22 17:44
are you running this in a container?

vlowther
2017-08-22 17:45
We log to stderr by default -- curerntly, we don't save logs to disk by ourselves.

2017-08-22 17:45
no... just install.sh without isolated

2017-08-22 17:46
systemd

vlowther
2017-08-22 17:46
We rely on stderr being handled appropriately by your init or container system for persistent logging right now.

2017-08-22 17:48
cool... I'll set debug using drpcli

2017-08-23 17:01
Does version 3 only use linux kit to deploy k8s ? Is the concept of deployments that was in V2 not available in v3?

2017-08-23 17:10
Yes, the roadmapping of 2 vs. 3 seems completely divergent.

2017-08-23 17:29
Both use Kubespray Ansible. For v3, there's no additional wrappers at this point


zehicle
2017-08-23 17:30
the v3 roadmap started with the Provision/DHCP services being split into stand alone things.

zehicle
2017-08-23 17:31
with the ability to register machines back into v2 using the API driven calls (which is what is what v2 does too)

zehicle
2017-08-23 17:32
the plan for v3 is that DRP becomes the new metal provider to be on part w/ the v2 cloud providers.

zehicle
2017-08-23 17:32
early feedback from v3 DRP is that it's sufficient for most uses cases people wanted

zehicle
2017-08-23 17:33
and using the ansible / terraform tools directly against DRP seems to be a more easily understood way to consume them

2017-08-23 17:34
@tpagden @hornjason I hope that helps

2017-08-23 17:45
@hornjason to answer the specific question: DR P (the current delivered part of v3) has no deployment concept. It does have profiles that can be used to provide grouping mechanisms for machines.

2017-08-23 17:48
@zehicle I apologize, I'm not keeping up on a few things then. So the v2 app catalog, backed with ansible - that's no long considered to eventually be in v3? I have the DRP v3 running, however, compared to the v2 dashboard... maybe there's some step I'm missing to get back to the functionality that v2 seems to offer.

2017-08-23 18:07
we're working on a new dashboard for v3.1 release - private beta starting very soon of it

2017-08-23 18:08
but some of the features of v2 will be slower to emerge - we've found that people were having trouble getting the whole stack working and wanted something simpler to get started

2017-08-23 19:16
Gotcha, thanks for the response. If I continued to use v2, is there some form of migration or transfer of stewartship to v3 should v3.x emerge to have all the functions?

2017-08-24 05:06
it's time to start having bi-weekly meetings to discuss roadmap and design.

2017-08-24 05:10
@tpagden we have several plans for v2 -> v3 migration and integration. it really depends on which features you are depending on. Some of the annealer uses cases (especially for physical workflows) are much simpler in the DR P task systems. I'd be happy to talk here or 1x1 about migration plans - it really depends on which features of v2 you are implementing.

wdennis
2017-08-25 14:00
@zehicle @greg I think publishing roadmap/design would be great - would like to know futures of DRP and DR, and what the proposed use-cases are for each. Have always thought that since DRP used Sledgehammer to initially boot new nodes, it could be used to create an inventory of machines and their hardware/firmware (more useful for those of us using self-owned bare metal.) And after OS installation, have the ability to run an Ansible playbook against the node(s). Or will stuff like this be reserved for "full-on" DR? On a non-technical note, hope all you folks in TX remain safe with the coming hurricane/storms, and recover quickly!

zehicle
2017-08-25 15:31
@wdennis we DO have an Ansible dynamic inventory generator - checkout out /integrations/ansible and there's even docs for it

zehicle
2017-08-25 15:31
we'll get a 3.1 release notes and then start community meetings back up to draft a roadmap

zehicle
2017-08-25 15:32
(there's a Terraform provider in the works too... that will require 3.1 workflow capabilities).

wdennis
2017-08-25 16:50
@zehicle Thanks, looking fwd to the docs


zehicle
2017-08-26 16:57
note on using VirtualBox - the vmboxnet0 is not created in the system until a machine using it is booted

zehicle
2017-08-26 17:49
ALL - we're getting ready for the v3.1 release from master/tip. if you have time to test & validate functionality, please take some time to play with it. Especially if you have v3.0 bootenvs and templates

2017-08-29 01:54
Please help me updating bootenv. I am setting up one for windows server. There is a bootenv for this os in default deployment of DR. I am trying to add missing files, wimboot file in particular. So I have copied the file into windows-2012r2/install directory, but it is still not found by the provisioner. What actions should be taken after adding missing files for bootenv. Thank you.

greg
2017-08-29 02:56
umm - that is about to be removed because we can?t really support it right now. We haven?t built or tested it for a long time. It also requires you to have built wimboot images and the like.

greg
2017-08-29 02:56
I really need to put out the next release because it isn?t really supported.

2017-08-29 18:29
@lion_kg_twitter Windows support is complex (especially in v2) and we're pulling it out of community resources because it's not something that can be community supported at this time. RackN does provide support for Windows on DRP for customers where we can work 1x1 with people around their specific environments.

2017-08-29 18:32
@lion_kg_twitter reading your question another way... are you just asking about DRP file upload? You should be able to use the CLI files upload and then see the file and then see the file in the CLI files list

lae
2017-08-29 20:55
@zehicle I think there should maybe be a note, or upgrade task really, for migrating content from `/var/lib/dr-provision/` to `/var/lib/dr-provision/digitalrebar`

greg
2017-08-29 21:05
The old path should have worked still with the old options.

greg
2017-08-29 21:05
hmm - sigh. I?ll check

lae
2017-08-30 02:06
Is it intentional for `bootenvs install` to install all templates in the templates directory, despite the bootenv being installed not referencing them?

greg
2017-08-30 02:20
Yes. Templates can ref templates and we don't do on demand loading

zehicle
2017-08-30 14:31
new v3.1 content API allows for multiple items to be bundled together (and also makes them read only so they can be upgraded)

greg
2017-08-30 20:43
@lae - do you think this would be reasonable for a change to install.sh to address the production directory change? ``` if [[ ! -e /var/lib/dr-provision/digitalrebar && -e /var/lib/dr-provision ]] ; then sudo mkdir -p /var/lib/dr-provision/digitalrebar sudo mv /var/lib/dr-provision/* /var/lib/dr-provision/digitalrebar fi ```

lae
2017-08-30 21:21
I suppose - although I'm using an Ansible role myself for deploying DRP

lae
2017-08-30 21:21
I'm probably the only one, though

lae
2017-08-30 21:22
that `mv` command will try to move `digitalrebar` into itself, which I believe errors out

lae
2017-08-30 21:23
``` [musee@birdy tmp]$ mv hello/ hello/ mv: 'hello/' ?????????????? 'hello/hello' ???????? [musee@birdy tmp]$ echo $? 1 ```

lae
2017-08-30 21:23
yeah

lae
2017-08-30 21:25
you could rename the existing directory (`mktemp`), then recreate `/var/lib/dr-provision/`, then move the renamed directory into the newly created directory as `digitalrebar`

greg
2017-08-30 21:26
@lae - you are pulling the zip and doing it all yourself?


lae
2017-08-30 21:27
effectively

lae
2017-08-30 21:29
does an install and some configuration afterwards, like removing the default user (if specified) and creating our own admin user

greg
2017-08-30 21:39
nice!

vlowther
2017-08-30 21:43
sweet.

lae
2017-08-30 21:49
if wanted I can clean this up a bit and publish it but

lae
2017-08-30 21:51
I'd like to not use `get_url` and `unarchive` to get around making downloading/extracting the DRP release appear idempotent to ansible (there's a bug with regards to using `unzip` in the `unarchive` module)

lae
2017-08-30 21:54

lae
2017-08-30 21:56
ah nah it was this that I had read previously https://github.com/ansible/ansible/pull/24580#issuecomment-302592533

lae
2017-08-30 21:56
(so preferably I'd want releases to be tarballs)

greg
2017-08-30 22:04
that is why we use bsdtar. It process zips and does the right thing, usually.

zehicle
2017-08-30 22:42
@lae would you be interested in contibuting this to the project?

lae
2017-08-30 22:46
I have already :sweat_smile:

lae
2017-08-30 22:47
(just a PR to provision-content so far)

zehicle
2017-08-30 22:48
you're a rock star! thanks

2017-09-02 05:28
How do I get the static url for an iso?

2017-09-02 05:28
in a template?

2017-09-02 05:29
I'm having to do some non standard things to install this iso

2017-09-02 05:29
so i'm booting into sledgehammer and them i'm going to dd it

2017-09-02 05:29
but I can't figure out how to download the iso

shane
2017-09-02 05:52
@hadees - what version of the product are you using ?

shane
2017-09-02 05:53
are you trying to pull the ISO down from the provisioning server to the target server ?

shane
2017-09-02 06:03
if you are trying to pull down ISOs from your DRP server - it looks like you should be able to grab them from a similar URL: http://<drp_server>:8091/isos/ubuntu-16.04.3-server-amd64.iso

shane
2017-09-02 06:04
http://<drp_server>:8091/isos/sledgehammer-b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273.tar

2017-09-02 09:05
@hadees did you want to DD the image or is that a workaround? we've been working on some advanced integrations that use DD from Packer or similar instead of kickstart.

2017-09-02 09:07
@zehicle I ended up just uploading it as a file

2017-09-02 09:07
that worked out better anyway because it was compressed

2017-09-02 09:07
i kept running out of room if i had the whole iso

2017-09-02 09:07
although streaming the iso for dd would be awesome

2017-09-02 09:18
uploading it as file is a good idea if you just want to download it to the machines. the ISO upload is designed for the system to manipulate into boot images, not for direct retrieval

greg
2017-09-02 18:22
@shane - is correct. There are some additional things that are available too. If you are doing work in a template, you can do `{{ .ProvisionerURL }}` Which dynamically expands to `http://<drp_server>:<drp_webport>`. If you are running in sledgehammer, the cmdline for the kernel also has the provisioner Url. That is if you don?t want to hardcode things.

wdennis
2017-09-07 21:33
@greg Having an issue with my DRP install - was working fine, but then after rebooting the DRP admin node, I set a target machine for the desired bootenv, but when PXE-booted, TFTP is failing...

wdennis
2017-09-07 21:33
It's a 3.0.5 installation btw

shane
2017-09-07 21:34
@wdennis - does your pxe target get a DHCP addr from the DRP admin node ?

wdennis
2017-09-07 21:34
It actually gets one from the subnet router, which then hands it next-server

shane
2017-09-07 21:37
did you set dr-provision service to start on boot: ```sudo systemctl daemon-reload && sudo systemctl enable dr-provision```

wdennis
2017-09-07 21:37
Next-server is the DRP admin nodes IP, and file name is 'lpxelinux.0'

wdennis
2017-09-07 21:38
Yes, DRP is started and running

shane
2017-09-07 21:38
if the `dr-provision` service is running - it's easy to test the TFTP

shane
2017-09-07 21:40
from a host outside of the drp server - simply do: ```tftp <drp_server> tftp> verbose Verbose mode on. tftp> get lpxelinux.0 getting from drp_server:lpxelinux.0 to lpxelinux.0 [netascii] Received 92766 bytes in 0.1 seconds [5926590 bit/s] tftp> quit```

shane
2017-09-07 21:40
you should be able to snag that file (assuming now FW or iptables/etc entries interfering

shane
2017-09-07 21:42
if that is working - I'd suggest chucking `tcpdump` on your network interface and listen for incoming requests to make sure your next-boot hand off is working, and your pxe client is actually able to route to the drp server ```sudo tcpdump -i <network_interface_name> port 67 or port 68 or port 69```

wdennis
2017-09-07 21:44
@shane Yup, damn 'iptables' rules...

wdennis
2017-09-07 21:45
Had full DR on this node prior, that had Docker and spawned the wealth of iptables rules...

shane
2017-09-07 21:45
:slightly_smiling_face:

wdennis
2017-09-07 21:45
Bites me every time I (infrequently) reboot the DRP server...

shane
2017-09-07 21:46
what distro you using ?

shane
2017-09-07 21:46
I use an "iptables-restore" tool in Ubuntu to save/restore rules on reboots ... if I'm not using UFW

wdennis
2017-09-07 21:47
For DRP admin node, CentOS 7.3.1611

shane
2017-09-07 21:48
looks like centos has iptables-restore in yum repo too

shane
2017-09-07 21:48


wdennis
2017-09-07 21:54
Hmmm, something still awry- now I have no machines showing up in inventory!!

shane
2017-09-07 21:54
is it set to use a different bootenv than local ? ```drpcli machines show <UUID> | grep BootEnv "BootEnv": "local",```

wdennis
2017-09-07 21:56
Ah, wrong value used for DRP "--data-root="

shane
2017-09-07 21:56
are you running in "production" mode ?

shane
2017-09-07 21:56
in production mode - data-root is in `/var/lib/dr-provision`

wdennis
2017-09-07 21:56
Oh hell no :)

shane
2017-09-07 21:57
(meaning the installer mode - not really in "production") :slightly_smiling_face:

wdennis
2017-09-07 21:57
Kicking DRP off by hand

shane
2017-09-07 21:57
if you install with the ```?isolated``` flag to the installer, it drops everything in your local users home directory

shane
2017-09-07 21:58
other wise it does "production" mode - which is the ```/var/lib/dr-provision``` and ```/var/lib/tftpboot``` directories (Ubuntu-land)


wdennis
2017-09-07 21:58
Yeah I did --isolated

wdennis
2017-09-07 21:59
All is working now :sweat_smile:

shane
2017-09-07 22:00
fabulous !

wdennis
2017-09-07 22:00
@shane Are you a part of the RackN gang, or a user?

shane
2017-09-07 22:00
I'm the FNG at rackn

wdennis
2017-09-07 22:01
LOL

wdennis
2017-09-07 22:01
I'm still the FNG here at my job, 11 years in :stuck_out_tongue_winking_eye:

wdennis
2017-09-07 22:02
Well, nice to virtually meet you :)

shane
2017-09-07 22:02
You too!

wdennis
2017-09-07 22:02
Was able to meet Greg, Rob and Victor at DevOpsDays Austin this year

wdennis
2017-09-07 22:02
In person :)

shane
2017-09-07 22:02
I'm sorry you had to go through that ...

wdennis
2017-09-07 22:03
They were nice enough to this poor old chap

shane
2017-09-07 22:03
:slightly_smiling_face:

wdennis
2017-09-07 22:04
How close are we to 3.1 goodness?

shane
2017-09-07 22:06
...being the FNG... I'll I'll need to check w/ the rest of the gang - give me a few mins

shane
2017-09-07 22:06
(FNG == day 3 :slightly_smiling_face: )

wdennis
2017-09-07 22:07
Threw you right in there, huh

shane
2017-09-07 22:07
nothing like trial by fire ... I'm used to it - I've known the team for a long time - so they didn't cut me any slack

wdennis
2017-09-07 22:08
Looks like they put you on Slack instead of cutting you any... <drumroll /><cymbal />

shane
2017-09-07 22:10
doh!

wdennis
2017-09-07 22:10
Be here all week...

shane
2017-09-07 22:28
@wdennis - the unofficial answer is "about 2 weeks-ish time frame" ...

greg
2017-09-07 22:31
:slightly_smiling_face:

zehicle
2017-09-08 18:50
Lets plan a release meeting next week to review in open

2017-09-08 19:16
I would like to create a 5 node cluster like: http://node.mu/2016/12/19/5-node-nano-itx-kubernetes-tower/ but also test DigitalRebar on a 6th node - what motherboards can you recommend ?

shane
2017-09-08 20:10
Hey @maymann - there aren't any real requirements for a mobo for a Digital.Rebar provisioning node. Pretty much anything that can support a Linux distribution will work - if it'll run Ubuntu/CentOS ... etc ... you'll be just fine

shane
2017-09-08 20:10
obviously you'll need a network interface for the OS and imaging activities ...

shane
2017-09-08 20:12
with that in mind - any nano/mini/ITX mobo capable of running Linux and supporting a NIC - will work for you

shane
2017-09-08 20:25
Our pre-built binaries do have a somewhat limited support for processor architecture that we compile for - but since our binaries are implemented in Go Lang - you should be able to compile if you end up using a hardware architecture we don't currently pre-compile for.

shane
2017-09-08 20:33
currently compiled binary support is as follows: ``` bin/linux/amd64/ bin/linux/386/ bin/darwin/amd64/ bin/darwin/386/ bin/windows/amd64/ bin/windows/386/``` Notes: "darwin" being a Mac OS X platform - if you're not familiar with that designation "386" being a 32 bit build and "amd64" being a 64 bit build

2017-09-08 21:06
Hi all - just started playing with DRP and found the Ubuntu bootenvs file is outdated. On my harddrive: assets/bootenvs/ubuntu-16.04.yml -- iso doesn't exist on the mirror - 16.04.2 needs to be updated to 16.04.3 in the yml file. I'd push a patch myself but it would probably take me an hour (or more) to figure out how :(

shane
2017-09-08 21:10
Hi @aimee - welcome !

shane
2017-09-08 21:10
we have updated the repo with that change, but it hasn't yet been pushed in to a newly cut version yet

2017-09-08 21:11
zehicle: thanks! I noticed the SHA value has to be updated as well. Also for Debian the SHA is listed as BAD in the UI

shane
2017-09-08 21:12
Aimee - to short cut around the problem - you can update your current assets bootenv file with the following: ``` IsoFile: "ubuntu-16.04.3-server-amd64.iso" IsoSha256: "a06cd926f5855d4f21fb4bc9978a35312f815fbda0d0ef7fdc846861f4fc4600" IsoUrl: "http://mirrors.kernel.org/ubuntu-releases/16.04/ubuntu-16.04.3-server-amd64.iso"```

shane
2017-09-08 21:12
this change has been made in `master`, though - so the next release will have it

2017-09-08 21:14
Thanks zehicle. I still have some coding know-how left... haven't completely gone over to the sysadmin/management dark side. (LOL). I have to say I'm liking DRP a lot - at least I understand more of it compared to MAAS (blech), Foreman, RackHD. I'm looking forward to using it more and maybe even contributing if I can dust off my hacking skills.

shane
2017-09-08 21:14
awesome! we look forward to helping you along your journey ... and look forward to any contributions back to DRP you can help with :slightly_smiling_face:

greg
2017-09-08 21:15
And I like the complement. Our goal has been to be simple

2017-09-08 21:16
Thanks zehicle - it will be an interesting journey - hoping to use DRP along with the Moby Project's Infrakit to manage/provision my cloud native POC lab.

greg
2017-09-08 21:17
Also as a reminder. We are generally slack based but have a bot that relays from IRC and gitter to slack and back. The user IRC and gitter see talking is zehicle, but that is a munging. Like this is @shane and @greg right now. :slightly_smiling_face:

2017-09-08 21:17
I was just going to ask if you were on slack - noticed the relay. I don't have slack installed on this machine - gee I'd have to roll across my office to use slack...

greg
2017-09-08 21:20
We can hook you up if you want slack access

2017-09-08 21:20
Hopped onto Gitter but yes, please, hook me up with Slack

greg
2017-09-08 21:22
send an email - with your email. :slightly_smiling_face:

2017-09-08 21:22
Awesome - thanks gentlemen!

aimeeu
2017-09-08 21:28
has joined #json

shane
2017-09-08 21:29
@aimeeu ... welcome ... :slightly_smiling_face:

aimeeu
2017-09-08 21:31
Thanks @shane

rackn_eng
2017-09-08 22:52
has joined #json



2017-09-09 18:42
testing new Eng user from Gitter

2017-09-09 18:42
seems to go to both #community and #community1

zehicle
2017-09-09 18:43
working on it

2017-09-09 18:54
I've got the connection from Gitter to Slack working

2017-09-09 18:54
checking from alias

shane
2017-09-09 18:55
woot!

zehicle
2017-09-09 18:55
Slack to IRC is not working yet...

shane
2017-09-09 18:55
boo

zehicle
2017-09-09 18:55
mostly because I messed up the auth... soon

2017-09-11 13:41
I have a few nube questions. I have been reading the docs and am excited to use Digital Rebar. Its nice! Question. If I wanted to bind user access to say a local LDAP or AD using SAML or just LDAP is there a way to do that?

shane
2017-09-11 14:32
hey cbitter78_twitter -welcome !

2017-09-11 14:32
Thanks

shane
2017-09-11 14:36
What is your use case for the LDAP/AD auth - are you looking to build a self-service solution for provisioning - or are you looking to control auth to multiple provisioning endpoints ?

2017-09-11 14:37
More of the self service model Where in I can have a set of inventory in a data center and then via access / roles / etc and perhaps bind those to AD users / groups

shane
2017-09-11 14:37
Sure - makes sense.

2017-09-11 14:37
so I can give the Dev group access to racks 2 - 3 and prod racks 4 - 10

shane
2017-09-11 14:38
Presumably you're looking at the Digital Rebar Provisioning (DRP) solution right now ?

2017-09-11 14:38
yes.

shane
2017-09-11 14:38
The current release version is 3.0.5 - have you played with it yet - and if so, what version ?

2017-09-11 14:38
Am am doing read the docs discovery right now. ;)

shane
2017-09-11 14:39
we're just about to release 3.1 in to the community (probably another 2 weeks) ...

shane
2017-09-11 14:39
the LDAP/AD auth solution is a current near term road map feature - but that will be a RackN enterprise feature

2017-09-11 14:39
Cool.

2017-09-11 14:40
another nube question. RackN Enterprise is Digital Rebar Paid?

shane
2017-09-11 14:40
we definitely have it on the roadmap, and it's a frequently asked feature both for Self Service and mult-endpoint management

2017-09-11 14:42
Is there a better forum to ask questions about RackN?

shane
2017-09-11 14:43
do you use Slack ? we can give you a slack invite, and we can drop to a D.M.

2017-09-11 14:44
I do use Slack

2017-09-11 14:48
you can send the invite to cbitter78@aol.com


rackn_eng
2017-09-11 15:20
testing IRC to Slack and Gitter

shane
2017-09-11 15:27
@cbitter78_twitter - invite sent, sorry for the delay ...

cbitter78
2017-09-11 15:28
has joined #json

lae
2017-09-11 15:29
there's a native Slack IRC gateway, if an admin enables it for this server

lae
2017-09-11 15:29
(^ followup to discussion with @aimeeu on friday)

2017-09-11 16:16
morning, I have a quick question about provisioning, does Rebar support Cloud-Init?

shane
2017-09-11 16:52
hey Telmo - what are you trying to do with cloud-init ? what's the use case ?

shane
2017-09-11 16:54
we do have some current support for cloud-init, which is going to emerge in a release or two - at the moment it's not very well productized, but we're getting there

shane
2017-09-11 16:54
we're hoping you can help out with your use case - to help guide where we go with it

2017-09-11 17:14
we are looking into provisioning alternatives and I am working on the market analysis for product capabilities. The use case is to have "Code as Infrastructure" that is portable across solutions. Having the Cloud-Init scripts in a CVS repository where host or hosts groups specific configurations that can be accessed by physical provisioning and "Cloud Provisioning" so hosts/instances build are as identical as possible

zehicle
2017-09-11 23:25
@Telmo, what other tools are you using? Ansible? Terraform? Chef?

zehicle
2017-09-12 02:21

shane
2017-09-12 02:32
10 please

zehicle
2017-09-12 22:50
I'll have them at Hashicorp Fest next week if anyone is in town!

zehicle
2017-09-12 22:51
We're looking for people who want to help beta test our Terraform to Metal provider

2017-09-13 03:30
Hello! How do I update DR without loosing all current settings and nodes? Thank you.

shane
2017-09-13 03:31
Hey lion_kg_twitter - welcome

shane
2017-09-13 03:31
what version are you currently running - and what version are you looking to upgrade to ?

2017-09-13 05:08
shane, where can I see current version?

shane
2017-09-13 13:14
lion_kg_twitter - if you are running Digital Rebar Provision - then you can get the version via `drpcli version` - if you have a Digital Rebar v2 server - you'll have a "rebar" command - which would be `rebar version`. Your binaries may be installed in different locations - depending on how you did the original install.

2017-09-14 16:52
just starting out..do i need both a V2 and a V3 (DPR) servers? or could i just start out with v3?

shane
2017-09-14 16:53
@smartekb - highly recommend you ONLY use DRPv3 server ...

shane
2017-09-14 16:53
we are very close to releasing the DRPv3 ver 3.1 release - likely next week - right now - the current released version is 3.0.5

2017-09-14 16:55
thx Shane. I'm able to pxe boot a VM using sledgehammer...fter the VM boots up, have not been able to get it to provision an OS (after setting its bootenv to CentOS)

shane
2017-09-14 16:57
did you load the ISO ? you need to run `drpcli bootenvs uploadiso <iso_name>`

shane
2017-09-14 16:58
by default DRP is distributed without any "content" - you need to initialize/add content to be able to deploy it

shane
2017-09-14 17:02
which version are you running? (`drpcli version`)

2017-09-14 17:03
i have the iso..it shows all green in the ux. I'm running version 3.05

2017-09-14 17:03
i'll re-run that command ago, but i seem to have issues with the uploadiso command

shane
2017-09-14 17:04
can you provide the command/output of your uploadiso attempt ?

shane
2017-09-14 17:06
in 3.0.5 - you should have an "assets" directory and a "tools" directory in the base working directory when you did the install

2017-09-14 17:07
i do have those

shane
2017-09-14 17:07
if you do: ```cd assets ../tools/discovery-load.sh ``` that is a helper script (needs to run from the `assets` directory) that will load content using the YAML definition files in the `assets/bootenvs` directory

shane
2017-09-14 17:08
you can also JUST load a specific piece of content via: `drpcli bootenvs install bootenvs/centos-7.3.1611.yml` for example

shane
2017-09-14 17:09
that `tools/discovery-load.sh` is just a helper - which only runs that one command for different content

shane
2017-09-14 17:10
once the bootenvs load is complete - you should see it in the green UI

shane
2017-09-14 17:13
once you have content loaded ... you need to: * *you must* also add the Discovery and Sledgehammer bootenvs - they're required * make sure you have a Subnet defined (via UI "Subnets" or "drpcli subnets" commands * boot a machine w/ PXE enabled that hits the DRP endpoint (your provisioning server) * then assign a "bootenv" to the machine * reboot your server - and it should kick the PXE provisioning process for with the given

2017-09-14 17:15
Version: v3.0.5-0-2b326b01a5ef733f3fe599cac2c7aaa6e914b17f

shane
2017-09-14 17:19
yep - the above process should be good for your version

shane
2017-09-14 17:19
one note - there is a small bug in the content - for Ubuntu 16.04 - it tries to get 16.04.2 which is replaced by 16.04.3 in the repos

shane
2017-09-14 17:19
centos will work fine

2017-09-14 17:20
`root@cn-ddev03:/opt/drp/assets# ../tools/discovery-load.sh No assets directory to work from. root@cn-ddev03:/opt/drp/assets# ls bootenvs isos profiles startup templates root@cn-ddev03:/opt/drp/assets# ls bootenvs/ centos-6.8.yml debian-8.yml esxi-6u2.yml lk-sshd.yml redhat-7.0.yml ubuntu-14.04.yml centos-7.3.1611.yml discovery.yml lk-k8s-master.yml local.yml scientificlinux-6.8.yml ubuntu-16.04.yml debian-7.yml esxi-650a.yml lk-k8s-node.yml redhat-6.5.yml sledgehammer.yml windows-2012r2.yml`

2017-09-14 17:20
that was terrible..lemme clean that up

shane
2017-09-14 17:21
oops .... sorry, run that from the parent directory of "assets" ... :slightly_smiling_face: my fault

2017-09-14 17:21
`/opt/drp/assets# ../tools/discovery-load.sh No assets directory to work from.`

shane
2017-09-14 17:29
is that discovery load working for you now ?

2017-09-14 17:31
it did..with a couple errors about local not being available..i'm going to the ux to find out

2017-09-14 17:49
root@cn-ddev03:/opt/drp# drpcli bootenvs uploadisos centos-7.3.1611 Access CLI commands relating to bootenvs Usage: drpcli bootenvs [command]

2017-09-14 17:49
seems there is no upoadisos command in the binary?

shane
2017-09-14 17:50
if you don't have the ISO local to your `drpcli` command - use the `drpcli bootenvs install /opt/drp/assets/bootenvs/centos-7.3.1611.yml`

shane
2017-09-14 17:51
if you have the ISO local - do: `drpcli isos upload <ISO_FILE> as centos-7.3.1611`

2017-09-14 17:57
context deadline exceeded ..does that mean a timeout?

shane
2017-09-14 18:11
Yes, timeout with download/upload

shane
2017-09-14 18:12
Is your internet bandwidth constricted?

shane
2017-09-14 18:14
If so, you might download the iso separately, then, use the "drpcli isos upload...." command

2017-09-14 18:24
i have the ubuntu 16.04 iso downloaded locally..can i use the isos upload command to bypass the bug?

shane
2017-09-14 18:29
Yes, use the "isos" command

shane
2017-09-14 18:30
In a short bit, I can provide you the yaml with the updated content for Ubuntu - at lunch now....

2017-09-14 18:31
cool..thx

2017-09-14 18:31
`root@cn-ddev03:/opt/drp/assets# drpcli bootenvs install bootenvs/centos-7.3.1611.yml 2017/09/14 14:30:21 Installing bootenv centos-7.3.1611-install 2017/09/14 14:30:21 Uploading isos/CentOS-7-x86_64-Minimal-1611.iso to DigitalRebar Provision Error: Failed to fetch bootenv: centos-7.3.1611-install: context deadline exceeded `

2017-09-14 18:32
timing out..even on a local iso upload

shane
2017-09-14 18:33
Is your drpcli command running on your endpoint you are uploading content to?

2017-09-14 18:33
yea

shane
2017-09-14 18:33
What is the OS distro/version of your server

2017-09-14 18:34
ubuntu 16.04

shane
2017-09-14 18:40
Can you provide filesystem space info (df - hl)

shane
2017-09-14 18:51
...and did you install with "--isolated" mode - or production mode (i.e. not specifying the "--isolated" flag)?

2017-09-14 18:53
production mode

2017-09-14 18:54
/dev/mapper/cn--ddev03--vg-root 41G 25G 15G 64% /

shane
2017-09-14 19:11
smartekb - 2 follow up questions: 1. what type of drives is your DRP install (default location would be /var/lib/dr-provision - if you didn't relocate it) 2. can you provide the output from this quick DD command? `time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile` (running from the same directory/filesystem as your DRP install location (eg /var/lib/dr-provision)

2017-09-14 19:36
root@cn-ddev03:/var/lib/dr-provision# time sh -c "dd if=/dev/zero of=ddfile bs=8k count=250000 && sync"; rm ddfile 250000+0 records in 250000+0 records out 2048000000 bytes (2.0 GB, 1.9 GiB) copied, 7.10381 s, 288 MB/s real 0m14.448s user 0m0.068s sys 0m6.828s

shane
2017-09-14 19:39
We believe you _might_ be hitting a timeout value in the drpcli client - if you're up for testing our latest version (not "stable") - then this issue very well might go away ....

shane
2017-09-14 19:40
if you install again from "tip" - with production mode - then it'll update the existing version .... the only thing you might need to do is re-enable any subnet(s) you might have created previously

shane
2017-09-14 19:41
there is a new version feature that lets you "enable"/"disable" subnets, and the new "tip" version sets a subnet to "disable"

2017-09-14 19:42
thats what I did earlier today :smile:

2017-09-14 19:45
i'l do again

shane
2017-09-14 19:45
can you please run "drpcli version" ? you showed me version 3.0.5 previously

shane
2017-09-14 19:46
the "tip" will output "3.0.4-tip-...." number - which _looks_ older than 3.0.5 - but it's not :slightly_smiling_face:

2017-09-14 19:56
Version: v3.0.5-0-2b326b01a5ef733f3fe599cac2c7aaa6e914b17f

shane
2017-09-14 19:56
right - that's older than "tip" version - the changes in "tip" should have a fix in it for you

shane
2017-09-14 19:57
but if you do "drpcli version" with "tip" - it'll show a version string of "3.0.4-tip .... " though it looks older - it's actually a newer version :slightly_smiling_face:

2017-09-14 20:43
how do i do this update? wat ive been doing is not working..i'm still showing 3.05

shane
2017-09-14 20:49
Change the curl url to "tip", instead of "stable"

greg
2017-09-14 21:21
and add --drp-version=tip

greg
2017-09-14 21:22
```curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/tip/tools/install.sh | bash -s -- --isolated install --drp-version=tip```

shane
2017-09-14 21:24
except ... don't use "--isolated" since you're doing a "production" install :slightly_smiling_face:

greg
2017-09-14 21:28
:slightly_smiling_face: oops.. Thanks

greg
2017-09-14 21:29
Okay better version: ```curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/tip/tools/install.sh | bash -s -- --drp-version=tip install```

2017-09-14 22:03
Greetings folks - can I get some details on how to build an OS install template? It's not super clear to me with the docs for DRP and if it helps I just need CentOS 7

shane
2017-09-14 22:04
@edolnx - if you are fine with starting with a stock "minimal" install - you can use the "ce-centos" template from the community content repo

2017-09-14 22:06
That should be fine. Thanks!

2017-09-14 22:06
I can at least look at it to learn/modify as I go

shane
2017-09-14 22:06
are you running 3.0.5 ? (`drpcli version`)

2017-09-14 22:07
Stable, yeah `Version: v3.0.5-0-2b326b01a5ef733f3fe599cac2c7aaa6e914b17f`

shane
2017-09-14 22:10
did you do the "discovery-load.sh" - or insure you installed the Sledgehammer an Discovery bootenvs already ?

2017-09-14 22:11
Yes, did the discovery-load

shane
2017-09-14 22:11
cool - did you do an "isolated" install mode or "production" ?

2017-09-14 22:12
I did not do isolated, so production?

shane
2017-09-14 22:12
:slightly_smiling_face: yep

2017-09-14 22:12
I've also had to modify the systemd script to set the IP address and certs correctly

shane
2017-09-14 22:12
ok - cool

shane
2017-09-14 22:12
did you're content end up installed in /var/lib/tftpboot ?

2017-09-14 22:13
I think so, my target node booted is was discovered

shane
2017-09-14 22:16
`drpcli bootenvs install assets/bootenvs/centos-7.3.1611.yml` (Need to run that from the base directory you did the original install in - or give the fully qualified path to the "assets" directory)

shane
2017-09-14 22:17
if you have a Centos 7 minimal ISO file on hand already - you can use the "isos" command to just upload the ISO - without doing the download again

shane
2017-09-14 22:18
`drpcli isos upload <ISO_FILE> as centos-7.3.1611`

2017-09-14 22:18
`[kumulus@koiab dr-provision-install]$ drpcli bootenvs install assets/bootenvs/centos-7.3.1611.yml Error: Error determining whether bootenvs dir exists: stat bootenvs: no such file or directory`

shane
2017-09-14 22:23
sorry edolnx - you need to run that command from the "assets" directory - so: ```cd <somewhere>/assets drpcli bootenvs install bootenvs/centos-7.3.1611.yml```

2017-09-14 22:24
That looks better!

shane
2017-09-14 22:24
woot !!

2017-09-15 17:44
i must say, the new interface UI is pretty

2017-09-15 17:44
but does it really phone home?

shane
2017-09-15 17:46
thx! our UI guys will be happy to hear - just a note ... that UI is ... definitely ... still in "Alpha" stage - there are lots of sharp edges that can cut you

2017-09-15 17:47
i'm bleeding all over already

shane
2017-09-15 17:47
definitely don't recommend relying on it yet :slightly_smiling_face:

2017-09-15 17:48
but the url bar appears as if we are hitting rackN..that's not true, is it?

shane
2017-09-15 17:48
the UI is running as a SaaS/Portal in our cloud environment

shane
2017-09-15 17:48
the DRP endpoint (your provisioning server/service) does NOT reach out to the SaaS portal - we do not phone home from DRP endpoint

2017-09-15 17:49
just looks like?

shane
2017-09-15 17:49
however - by accessing the SaaS/Portal from your browser, we can transfer content from the SaaS/Portal to your DRP endpoint

shane
2017-09-15 17:49
the Saas/Portal is hosted on our side

shane
2017-09-15 17:50
since this isn't released yet - we haven't published the docs and pretty pictures that help explain the flow

shane
2017-09-15 17:51
(note my edit: "by accessing the SaaS/Portal from your browser")

shane
2017-09-15 17:55
access and use of the Saas/Portal goes like this: * user hits DRP Endpoint UI * redirected to RackN SaaS/Portal * DRP endpoint never reaches _out_ to the SaaS/Portal * content transfers happen as a PULL from the SaaS/Portal to your browser * content is pushed from your browser to DRP endpoint at your request/authorization * that's how content is updated via transfer from the SaaS/Portal to your DRP endpoint

shane
2017-09-15 17:55
does that help to make a little more sense on the flow and access ?

2017-09-15 17:56
the redirection happens by itself

2017-09-15 17:57
?

shane
2017-09-15 17:59
well .. you have to open up the URL and point to your DRP Endpoint ... (for example if it's running locally on your laptop: https://127.0.0.1:8092 ) - then, it redirects to the (not official resting place) https://rackn.github.io/provision-ux/ location

2017-09-15 19:28
@rackneng so, what are the solutions that will be available for organizations who need a 100% on-prem solution and want a pretty UI?

greg
2017-09-15 19:34
The UI can be served from the DRP instance itself and can be packaged for that. Since we expect high churn in the short term, we are starting with a web accessed solution.

greg
2017-09-15 19:36
The only current requirement is that the browser being used to manage DRP has access to the internet (web proxies work), but we understand the potential need for completely internalized solution.

2017-09-15 19:50
thanks, will keep that in mind as we begin/continue our evaluation

2017-09-15 20:46
so, is it possible to have it NOT redirect?

shane
2017-09-15 20:47
don't go to the UI url :slightly_smiling_face:

2017-09-15 20:47
ha!

shane
2017-09-15 20:49
the 3.0.5 "green ui" that used to be there is deprecated - the new UI will be "the way forward" ... in the future there may be an option to provide the new UI as fully embedded in the DRP endpoint side - and not a SaaS/Portal hosted by rackn - however, that's not the case for now

shane
2017-09-15 20:49
if the community wishes to build/maintain a UI for the DRP endpoint - they certainly are welcome to

shane
2017-09-15 20:50
we'd happily welcome any commits in that respect

2017-09-16 18:55
I'd add that we believe strongly in decoupling the UX from the DRP service & CLI. The design for the service is to keep it very small, stable and focused. We are seeing use cases where 1000s of DRP endpoints are deployed for embedded edge or top of rack infrastructure. Since the service is API driven, there's no need to bundle UI code into the service.

2017-09-16 21:54
Can I get an invite to the slack channel?

shane
2017-09-16 22:39
Hey @stanchan - you can sign up at https://www.rackn.com/support

2017-09-17 00:33
@stanchan I've got your email - invite sent

zehicle
2017-09-18 15:29
NOTE: Changes to Swagger on Friday have caused build failures. We are investigating...

carl
2017-09-18 19:05
has joined #json

2017-09-18 21:45
if i wanted to deploy windows, what would i have to do to the windows ISO to get windows installation?

lae
2017-09-18 22:11
`drpcli bootenvs update some-os-install - < bootenvs/some-os-install.yml` should refetch the "ISO" if the IsoUrl changed, right? in the context of tarballs. I note ce-sledgehammer has no hash

greg
2017-09-18 22:21
I?ll need to answer these a little later. I?ll come back to it

lae
2017-09-18 22:22
I meant to type IsoUrl/IsoFile instead of just IsoUrl

lae
2017-09-18 22:22
but alright

zehicle
2017-09-18 23:37
@lae updating the bootenv does not pull the iso

zehicle
2017-09-18 23:38
by design, Provision is push only. it never reaches out by itself. There is an upload for the master (soon 3.1) CLI that will pull then push

lae
2017-09-18 23:45
yeah, I figured updating wouldn't/shouldn't pull the iso, but yeah I didn't see a relevant alternative

lae
2017-09-18 23:48
don't have the need to upload a new iso right now since I just wiped/recreated a bootenv but - is the workflow supposed to be `drpcli bootenvs update os-install - < bootenvs/os-install.yml` and then `drpcli bootenvs uploadiso os-install`?

zehicle
2017-09-19 01:57
@lae yes

zehicle
2017-09-19 01:57
that's the new CLI command that uses the information from the bootenv to pull the ISO

2017-09-19 03:13
@smartekb_twitter there are a few ways to do Windows deploys - generally, it's something we discuss on a call b/c it's environment & process specific.

greg
2017-09-19 03:46
@lae is correct. Changing the IsoURL, IsoFile, and IsoSha in the bootenv will make the bootenv unavailable until the iso matching those values is uploaded.

greg
2017-09-19 03:48
@smartekb_twitter - what @zehicle said. There are many ways and we don?t support any specific one openly today. We have many custom ways that we?ve worked through, but nothing consistent enough to support for the community. Hopefully that will change with some time and users.

wdennis
2017-09-20 01:25
Hi @greg @zehicle - interested in learning more about the Terraform integration; may have a use for it...

greg
2017-09-20 01:27
:slightly_smiling_face: I hope to have a video shortly and some information. We are almost there.

wdennis
2017-09-20 01:29
No rush ;)

greg
2017-09-20 01:29
the provider will manipulate machines in DRP from a pool. It will allow you to transition machines through install process.

greg
2017-09-20 01:33
You will be able to do something this: ``` provider "drp" { api_user = "rocketskates" api_password = "r0cketsk8ts" api_url = "https://147.75.73.159:8092" } resource "drp_instance" "one_linux_node" { count = 1 stage = "centos-7.3.1611-install" description = "Linux node installed by centos 7.3" } ```

greg
2017-09-20 01:34
That will take a machine and install centos on it.

2017-09-20 14:00
hi all, i have a problem with a error message. from my point of view, the problem isn't a failure, it's a warning, may be. can anybody explain if i'm wrong or the message not fits? thanks!

2017-09-20 14:00
TASK [wait for admin convergence [1 upto 20 minutes]] ************************** fatal: [10.241.236.92]: FAILED! => {"changed": true, "cmd": ["/root/digitalrebar/deploy/scripts/wait_for_rebar.sh"], "delta": "0:20:09.411997", "end": "2017-09-20 16:06:32.287249", "failed": true, "rc": 1, "start": "2017-09-20 15:46:22.875252", "stderr": "Took too long for system deployment to appear", "stdout": "Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-\n : manager\nThis system is receiving updates from RHN Classic or Red Hat Satellite.\nPackage epel-release-7-7.noarch already installed and latest version\nNothing to do\nLoaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-\n : manager\nThis system is receiving updates from RHN Classic or Red Hat Satellite.\nMetadata Cache Created\nLoaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-\n : manager\nThis system is receiving updates from RHN Classic or Red Hat Satellite.\nPackage jq-1.5-1.el7.x86_64 already installed and latest version\nPackage curl-7.29.0-35.el7.x86_64 already installed and latest version\nNothing to do\nWaiting on system deployment", "stdout_lines": ["Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-", " : manager", "This system is receiving updates from RHN Classic or Red Hat Satellite.", "Package epel-release-7-7.noarch already installed and latest version", "Nothing to do", "Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-", " : manager", "This system is receiving updates from RHN Classic or Red Hat Satellite.", "Metadata Cache Created", "Loaded plugins: product-id, rhnplugin, search-disabled-repos, subscription-", " : manager", "This system is receiving updates from RHN Classic or Red Hat Satellite.", "Package jq-1.5-1.el7.x86_64 already installed and latest version", "Package curl-7.29.0-35.el7.x86_64 already installed and latest version", "Nothing to do", "Waiting on system deployment"], "warnings": []} to retry, use: --limit @digitalrebar.retry

2017-09-20 14:01
i see no reason for a fatal failure and a script brake.

greg
2017-09-20 20:01
The V3.1.0 release is out. Use the STABLE link to get the images.

greg
2017-09-20 20:01
the v3.1.0 link doesn?t have the correct version in the string. I has the right code, I think.

greg
2017-09-20 20:01
- FYI ^^

2017-09-20 20:03
i have sync my local git today and copy the code to the target server. i think this must be the actual version.

greg
2017-09-20 20:04
@theta-my - you are using DRv2. My post was about DRP (Digital Rebar Provision). Sorry, I wasn?t clear.

2017-09-20 20:04
ok,

zehicle
2017-09-20 20:05
@zehicle uploaded a file: https://rackn.slack.com/files/U02DHRR2L/F775N9WNA/v3_1_release_feature_list.md and commented: we're working to setup the community processes so that this becomes a community document

lae
2017-09-20 20:08
:+1:

lae
2017-09-20 20:08
we could use some stickers :joy:

greg
2017-09-20 20:09
@lae - direct message me your address and I?ll see what I can do.

2017-09-20 20:25
how do i download iso's again? example debian 8

greg
2017-09-20 20:26
Which release?

greg
2017-09-20 20:26
in DRP v3.1.0 - ```drpcli bootenvs uploadiso <bootenv name>```

greg
2017-09-20 20:27
in either DRP v3.1.0 or v3.0.5 ```drpcli bootenvs install bootenvs/debian.8... ```

lae
2017-09-21 00:35
@greg is incrementer useful for an end user?

lae
2017-09-21 00:54
anyway, pushed an update for 3.1.0 to the aur package for drpcli

lae
2017-09-21 01:00
so I ran into a little bit of a snag using `uploadiso` (I think this command should give some feedback indicating success and not just return), but that's because I removed `IsoSha256` from my bootenv, but updating the bootenv didn't remove that key and so it kept the old one/couldn't verify the new tarball

lae
2017-09-21 01:01
just leaving `IsoSha256` empty in the bootenv file worked though

wdennis
2017-09-21 01:31
Congrats RackN team on another release! Will have to update and play with it tomorrow...

wdennis
2017-09-21 01:33
How does hosted UI work - http://xxx.rackn.com URL?

shane
2017-09-21 01:39
If you hit port 8082 on your 3.1 endpoint, you'll be redirected to the hosted endpoint

shane
2017-09-21 01:40
Via https

lae
2017-09-21 01:57
that's pretty spiffy

greg
2017-09-21 01:57
@lae it should not get filled in if empty. Legacy bootenvs are missing it. Once I can update sledgehammer the base bootenvs will have Shas

lae
2017-09-21 01:58
nah this was a custom bootenv I built for our appliance operating systems

greg
2017-09-21 01:58
Okay 310 doesn't fill it anymore

lae
2017-09-21 01:59
I had added `IsoSha256` while troubleshooting why the bootenv wasn't updating - then got rid of it when I realised how to use uploadiso

lae
2017-09-21 01:59
but the previous value was stuck until I had added `IsoSha256` to the yaml but left it empty (which then removed it from the installed bootenv)

lae
2017-09-21 02:01
anyway, good job on the UI, it's looking a lot better (I can actually add profiles for those bootenvs with requiredparams now :smile: )

lae
2017-09-21 02:07
input's kind of cut off here though (and maybe other similar dialogs) https://up.lae.is/i/1505959595-ecc06.png

zehicle
2017-09-21 02:24
It's a reference example

2017-09-21 14:25
i've deployed a cent-os-7 to a host..is there a default login/password?

shane
2017-09-21 14:25
usually the recommended practice would be to inject an SSH key

2017-09-21 14:26
prolly missed that in the docs..could u send me where i can find that?

greg
2017-09-21 14:31
Shane is correct. The root password is set by a parameter it defaults to RocketSkates

2017-09-22 06:04
Hi

2017-09-22 14:21
hello

shane
2017-09-22 14:41
Howdy

2017-09-22 14:45
can someone show me/point me to how to inject ssh key ? cos the root/RocketSkates is not working for me to login...I'm installing centos-7

shane
2017-09-22 15:25

wdennis
2017-09-22 19:05
@shane Having a bit of a problem upgrading from 3.0.5 to 3.1.0

greg
2017-09-22 19:06
Yeah - @shane has been finding some issues.

greg
2017-09-22 19:07
What are you seeing? Did you have a ?production? install?

wdennis
2017-09-22 19:09
I ran the curlbash in my existing "drp" dir where my prior 3.0.5 install was (isolated) - it seemed to run OK, but when I restated the dr-provision binary, I see I'm still on 3.0.5 :hushed:

greg
2017-09-22 19:09
add --force

greg
2017-09-22 19:09
to make it download the image.

greg
2017-09-22 19:09
but before you do that.

wdennis
2017-09-22 19:10
To the bash flags?

greg
2017-09-22 19:10
Yeah, but wait a moment

wdennis
2017-09-22 19:10
Standing by

greg
2017-09-22 19:10
Can you show my your curl bash command?

shane
2017-09-22 19:10
Give me a few mins

greg
2017-09-22 19:11
even betterer

shane
2017-09-22 19:11
Eating lunch - I have some fixes to install.sh

wdennis
2017-09-22 19:12
I too will eat the lunch then


greg
2017-09-22 19:13
Once Shane is happy, you will want to change stable to tip in the curl part and probably add --force to the bash part. You will still download stable bits, but you will get the fixed install.sh.

wdennis
2017-09-22 19:14
Ah, makes sense

wdennis
2017-09-22 19:52
@shane Ready to go when you are :)

shane
2017-09-22 19:52
copy

shane
2017-09-22 19:53
You currently have a 3.0.5 install in "production" mode (or "system" mode) - correct ?

shane
2017-09-22 19:53
and you want to upgrade to current 3.1.0 mode

shane
2017-09-22 19:53
just want to make sure we're on the same page on what you need to do

wdennis
2017-09-22 19:56
No, it's in 'isolated' mode

wdennis
2017-09-22 19:56
Want to upgrade to 3.1 in isolated

shane
2017-09-22 19:56
ah! much easier path ... :slightly_smiling_face:

wdennis
2017-09-22 19:57
That sounds good :)

shane
2017-09-22 20:47
ok @wdennis -- with the following notes: * stop dr-provision server first if it's running (install.sh will bomb out if it's running) * backup your content first ... just in case :slightly_smiling_face: Give this a go: ```export VER=tip curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/${VER}/tools/install.sh | bash -s -- install --isolated --upgrade``` This will download the current "tip" version of `install.sh`, but perform a `stable` version upgrade (since `--drp-version-VER` isn't specified)

wdennis
2017-09-22 20:51
Backup in progress ;)

wdennis
2017-09-22 21:08
Here we go...

wdennis
2017-09-22 21:11
Just sitting at "Installing Vesion stable..." -- that expected?

shane
2017-09-22 21:11
it's likely downloading content - you have a slow network connection ??



shane
2017-09-22 21:12
so - it's downloading

wdennis
2017-09-22 21:12
Not super slow - on commercial cable internet

wdennis
2017-09-22 21:14
Ok, finally completed

wdennis
2017-09-22 21:17
All above done....

wdennis
2017-09-22 21:18
Now start DRP the way I used to?

shane
2017-09-22 21:18
yep

wdennis
2017-09-22 21:19
Gah - still says Version: v3.0.5-0-....

greg
2017-09-22 21:21
make sure you didn?t copy drpcli and dr-provision into /usr/local/bin or /usr/bin or somewhere.

wdennis
2017-09-22 21:22
No, I'm running it from cwd which is the isolated dir I installed into



wdennis
2017-09-22 21:28
So I think the problem is that my binary in ./bin/linux/amd64/dr-provision is dated July 3 2017

shane
2017-09-22 21:29
in the base directory you did the install - run the binary that was installed to `bin/linux/amd64/dr-provision` (assuming 64 bit linux, of course)

shane
2017-09-22 21:29
that should have been updated :slightly_smiling_face:

greg
2017-09-22 21:29
hmmm - it exited early - it looks like

wdennis
2017-09-22 21:30
They are symlinks

greg
2017-09-22 21:31
Try this: ``` rm -f dr-provision.zip rm -f dr-provision.sha256 rm -f sha256sums ```


greg
2017-09-22 21:31
Then do the install command again, but add `--debug` as well to the bash section

wdennis
2017-09-22 21:32
@greg OK, done

wdennis
2017-09-22 21:32
Ack

wdennis
2017-09-22 21:36
Looks like it is downloading 'dr-provision.zip'

wdennis
2017-09-22 21:38
But, I got nothing happening when I do an 'iftop'

greg
2017-09-22 21:38
Is the install script still running?

wdennis
2017-09-22 21:40
Yes, but looks like I may have a local network issue... 44% packet loss measured :white_frowning_face:

wdennis
2017-09-22 21:41
Trying to resolve that, stand by

wdennis
2017-09-22 21:45
Ok, looks better, retrying

wdennis
2017-09-22 21:46
Looks like the DRP server is talking to AWS S3 now :)

wdennis
2017-09-22 21:47
Working but only pulling 50Kb/sec :disappointed:

wdennis
2017-09-22 21:50
OK, file date on dr-provision binary is 9/20/2017 now

wdennis
2017-09-22 21:52
Yay, v3.1.0-0-... on startup now :slightly_smiling_face:

wdennis
2017-09-22 21:54
How to set the DRP auth token for the remote UI?

greg
2017-09-22 21:58

greg
2017-09-22 21:58
should redirect you to an auth.

wdennis
2017-09-22 21:59
@greg @shane Looks like I have to set up the ISOs, machines, etc all over again?

wdennis
2017-09-22 21:59
I did get the remote UI

wdennis
2017-09-22 21:59
And am in it

wdennis
2017-09-22 22:00
But all my machines, ISOs, etc are not showing up

greg
2017-09-22 22:00
umm

greg
2017-09-22 22:01
okay - wait

wdennis
2017-09-22 22:01
UI shore is pretty tho :)

greg
2017-09-22 22:01
stop dr-provision

wdennis
2017-09-22 22:01
OK done

greg
2017-09-22 22:02
```sudo ./dr-provision --static-ip=192.168.1.158 --disable-dhcp --base-root=/home/dradmin/drp/drp-data --local-content="" --default-content=""```

wdennis
2017-09-22 22:06
Ok, let me try again

wdennis
2017-09-22 22:07
Phew :sweat_smile:

wdennis
2017-09-22 22:09
Now to commence the learnin'

wdennis
2017-09-22 22:11
So is there UI docs written yet?

greg
2017-09-22 22:13
oh no

greg
2017-09-22 22:13
:slightly_smiling_face:

greg
2017-09-22 22:13
previous usage should work. famous last words

wdennis
2017-09-22 22:13
TODO ? :stuck_out_tongue_winking_eye:

greg
2017-09-22 22:16
yes

zehicle
2017-09-23 20:39
If you've been using the new RackN UX then you've noticed the login buttons - those should be working now. There are a few changes in progress to save login sessions and allow storing endpoints if you are a registered user. There is NO COST for registering. The login is NOT required to use the endpoint admin features; however it will be required to access RackN content in the future.

wdennis
2017-09-24 03:07

rackn_eng
2017-09-24 04:49
yes - fix in process.

zehicle
2017-09-24 04:55
fix in is place

2017-09-25 23:27
A n00b here. Any pointers on how to figure out why a machine is getting a dhcp lease from dr-prov, but not completing the pxe boot?

shane
2017-09-25 23:28
Hello @nzsouthernman, welcome! What version of Digital Rebar Provision are you running ?

2017-09-25 23:28
check "drpcli prefs list" - make sure that you have set the unknown bootenv and default bootenv

2017-09-25 23:30
Running whichever version is 'stable' as of this morning.

2017-09-25 23:30
defaultboot is slegehammer, unknown is discovery.

shane
2017-09-25 23:31
Excellent -that should be the 3.1.0 release - you can verify this with `drpcli version`

2017-09-25 23:32
Version: v3.1.0-0-b70cf8ee1f61844a6d64070a8b272c2bec512204

2017-09-25 23:32
:)

2017-09-25 23:32
Currently running host & pxeclient on vmware, on an isolated vswitch that has a firewall between it & our core.

shane
2017-09-25 23:33
you can also take a look at things with the UI - if you haven't seen it already - simply point your web browser to your DRP endpoint w/ https and port 8092

shane
2017-09-25 23:34
The UI is running as the RackN portal - and your web browser proxies the connection to your DRP endpoint - your DRP endpoint does NOT reach out to the Portal

2017-09-25 23:35
Yeah, got the ui running (it's very nice btw), and checked the ISO's and earlier found that the initial sledgehammer I uploaded from the cli wasn't what the ui wanted. Downloaded and uploaded via the ui the sledgehammer b68 version and that appears to have made it happy.

shane
2017-09-25 23:36
presumably you see your pxeclient show up under the `Machines` inventory ?

2017-09-25 23:36
*made the ui happy. The pxeboot still doesn't seem to respond still.

shane
2017-09-25 23:36
wha bootenv does it show ?

2017-09-25 23:37
Nope. I only see the MAC under the Networking\leases as evidence that something's working.

2017-09-25 23:37
I had dhcp logging going and syslog showed the event.

shane
2017-09-25 23:40
could you please run `drpcli info get` for us?

2017-09-25 23:40
{ "api_port": 8092, "arch": "amd64", "dhcp_enabled": true, "file_port": 8091, "id": "00:50:56:9e:ec:bf", "os": "linux", "prov_enabled": true, "stats": [ { "count": 0, "name": "machines.count" }, { "count": 1, "name": "subnets.count" } ], "tftp_enabled": true, "version": "v3.1.0-0-b70cf8ee1f61844a6d64070a8b272c2bec512204" }


2017-09-25 23:42
Possibly something wrong with the DHCP config?

2017-09-25 23:43
it's a UX issue - can you get the subnet list from the CLI? I'll look at the UX render issue. It would help to have your subnet entry

shane
2017-09-25 23:44
sorry - you're seeing a current UX bug right now (UX is still in tech preview status) ... can you please do from CLI ?

shane
2017-09-25 23:44
:slightly_smiling_face:

2017-09-25 23:44
:D

2017-09-25 23:44
[ { "ActiveEnd": "192.168.1.200", "ActiveLeaseTime": 60, "ActiveStart": "192.168.1.100", "Available": true, "Enabled": true, "Errors": [], "Name": "ens192", "NextServer": "192.168.1.10", "OnlyReservations": false, "Options": [ { "Code": 0, "Value": "[object Object]" }, { "Code": 1, "Value": "255.255.255.0" }, { "Code": 2, "Value": "[object Object]" }, { "Code": 3, "Value": "[object Object]" }, { "Code": 4, "Value": "[object Object]" }, { "Code": 5, "Value": "[object Object]" }, { "Code": 28, "Value": "192.168.1.255" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "192.168.1.10/24", "Validated": true } ]

2017-09-25 23:45
Hmm, maybe I should delete and recreate my subnet from drpcli?

shane
2017-09-25 23:47
what is your "Reservation" strategy? Do you have reservations required ?

2017-09-25 23:48
No, no reservations required. host is 192.168.1.10, fw is 192.168.1.1, dhcp scope set 192.168.1.100-192.168.1.200. It's a test ip range, so pretty much anything is fine.

shane
2017-09-25 23:57
can you add a TFTP client to your endpoint (eg "yum install tftp" or "apt -y install tftp-hpa") then check that TFTP is working for you with this basic test: ```tftp 127.0.0.1 get default.ipxe```

shane
2017-09-25 23:58
even better would be a client in the same vswitch that has access to your endpoint - then hit your endpoint IP (192.168.1.10) for same test

2017-09-26 00:00
Testing what? browser to ui? I can bring up a linux vm on that vswitch pretty quickly.

shane
2017-09-26 00:00
testing basic TFTP to the drp endpoint

shane
2017-09-26 00:00
and that your tftpboot is serving the `default.ipxe` file

2017-09-26 00:14
Adjusted dhcp to look like this;

2017-09-26 00:14
[ { "ActiveEnd": "192.168.1.200", "ActiveLeaseTime": 60, "ActiveStart": "192.168.1.100", "Available": true, "Enabled": true, "Errors": [], "Name": "ens192", "NextServer": "192.168.1.10", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.255.0" }, { "Code": 3, "Value": "192.168.1.1" }, { "Code": 6, "Value": "192.168.1.1" }, { "Code": 28, "Value": "192.168.1.255" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "192.168.1.10/24", "Validated": true } ]

2017-09-26 00:15
No pxeboot stuff though. Which option should that go in?

2017-09-26 00:17
{Code: 3, Value: ip}, {Code: 6, Value: ip}, {Code: 15, Value: 'example.com'}, {Code: 67, Value: 'lpxelinux.0'}, ]

2017-09-26 00:17
where IP is generally from the drpcli interfaces list

2017-09-26 00:18
sorry about the UX bug... will get that updated tonight

2017-09-26 00:22
No worries about the UX, that stuff happens in products under dev. Current DHCP schema follows;

2017-09-26 00:22
{ "ActiveEnd": "192.168.1.200", "ActiveLeaseTime": 60, "ActiveStart": "192.168.1.100", "Available": true, "Enabled": true, "Errors": [], "Name": "ens192", "NextServer": "192.168.1.10", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.255.0" }, { "Code": 3, "Value": "192.168.1.1" }, { "Code": 6, "Value": "192.168.1.1" }, { "Code": 15, "Value": "burnside.school.nz" }, { "Code": 28, "Value": "192.168.1.255" }, { "Code": 67, "Value": "lpxelinux.0" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "192.168.1.10/24", "Validated": true }

shane
2017-09-26 00:24
FYI - your DRP endpoint does have a built in API documentation set - based on swagger: https://127.0.0.1:8092/swagger-ui/#/

2017-09-26 00:24
Unfortunately, still no pxeboot from my test client. The ubuntu VM I popped onto the subnet can browse ok now though.

shane
2017-09-26 00:24
Replace localhost w/ your endpoint URL - and you may need to update the Swagger URL bar with the correct IP addr

shane
2017-09-26 00:24
(though, admittedly - it doesn't document the DHCP option codes :slightly_smiling_face: )

2017-09-26 00:25
Hmmm, the test vm can browse externally, but can't access either the ui, or the swagger ui.

shane
2017-09-26 00:27
is your test vm able to tftp to the drp endpoint ? (also swagger-ui is HTTPS ... won't work on unencrypted http)

2017-09-26 00:28
Gotchya. Swagger works on https. Initially tried to redirect me to 127.0.0.1, but changing the redir to 192.168.1.10 brought up the ui. Will test tftp to the endpoint shortly.

2017-09-26 00:29
would you mind sharing the parameter list you started DRP with? it you did not set the static-ip then it could cause this problem

2017-09-26 00:29
Also, it would be helpful to have your drpcli interfaces show list

2017-09-26 00:31
[ { "ActiveAddress": "192.168.1.10/24", "Addresses": [ "192.168.1.10/24" ], "Index": 3, "Name": "ens192", "ReadOnly": true } ]

2017-09-26 00:31
systemd started the daemon, I'll dig around and see if I can find what it passed to the initial run of it.

2017-09-26 00:32
This was what I told it; sudo systemctl daemon-reload && sudo systemctl enable dr-provision

2017-09-26 00:33
This is what systemd has to say about the service;

2017-09-26 00:33
[Unit] Description=DigitalRebar Provision Integrated DHCP and File Provisioner Documentation=http://provision.readthedocs.io/en/latest/ After=network.target [Service] ExecStart=/usr/local/bin/dr-provision LimitNOFILE=1048576 LimitNPROC=1048576 LimitCORE=infinity TasksMax=infinity [Install] WantedBy=multi-user.target

shane
2017-09-26 00:35
can you please change the ExecStart stanza (in `/etc/systemd/system/dr-provision.service`) to append the following: ```--static-port=192.168.1.10```

2017-09-26 00:36
NOTE: static-ip

shane
2017-09-26 00:36
then `sudo systemctl daemon-reload && sudo systemctl restart dr-provision`

shane
2017-09-26 00:37
doh :slightly_smiling_face: thanks @zehicle - yeah (--static-ip=192.168.1.10) ... cut-n-paste of wrong line

2017-09-26 00:37
? dr-provision.service - DigitalRebar Provision Integrated DHCP and File Provisioner Loaded: loaded (/etc/systemd/system/dr-provision.service; disabled; vendor preset: enabled) Active: failed (Result: exit-code) since Tue 2017-09-26 13:36:51 NZDT; 1s ago Docs: http://provision.readthedocs.io/en/latest/ Process: 3294 ExecStart=/usr/local/bin/dr-provision --static-port=192.168.1.10 (code=exited, status=1/FAILURE) Main PID: 3294 (code=exited, status=1/FAILURE) Sep 26 13:36:51 lcrowbar01 systemd[1]: Started DigitalRebar Provision Integrated DHCP and File Provisioner. Sep 26 13:36:51 lcrowbar01 dr-provision[3294]: invalid argument for flag `--static-port' (expected int): strconv.ParseInt: parsing "192.168.1.10": invalid syntax Sep 26 13:36:51 lcrowbar01 systemd[1]: dr-provision.service: Main process exited, code=exited, status=1/FAILURE Sep 26 13:36:51 lcrowbar01 systemd[1]: dr-provision.service: Unit entered failed state. Sep 26 13:36:51 lcrowbar01 systemd[1]: dr-provision.service: Failed with result 'exit-code'.

2017-09-26 00:37
Didn't like that

2017-09-26 00:37
feck it, brb...

2017-09-26 00:38
that's better. testing now

2017-09-26 00:43
Ok. Will stop the daemon and run by hand. Will have another go at this tomorrow, thanks for your help so far today gentlemen.

shane
2017-09-26 00:49
@nzsouthernman - we'll look for you on the channel - cheers !!

lae
2017-09-26 07:07
``` {{if .ParamExists "part-scheme"}} {{template "part-seed-"+(.Param "part-scheme")+".tmpl" .}} {{else}} {{template "part-seed-default.tmpl" .}} {{end}} ``` so this isn't valid but I hope it should be obvious what I'm trying to do here - how can I get this effect of picking a template given a particular parameter? (not too familiar with golang/`text/template` and I'm still lost after looking at documentation)

greg
2017-09-26 13:11
```{{template (printf "part-seed-%s.tmpl" (.Param "part-scheme)) .}}```

greg
2017-09-26 13:11
I think, @lae

wdennis
2017-09-26 13:22
@lae @greg Parameterizing the OS install disks and the partitioning thereof should totally be a thing?

greg
2017-09-26 13:24
the os-disk is parameterized. @lae is working on parameterizing the partitioning scheme in a kickstart/preseed.

wdennis
2017-09-26 13:24
Of course, learning ?d-i partman? syntax is half the battle :stuck_out_tongue:

greg
2017-09-26 13:25
I suspect that @lae is just coming up with way to inject templates by variable name.

greg
2017-09-26 13:25
Then lots of templates could be hanging around.

greg
2017-09-26 13:25
by OS - most likely.

wdennis
2017-09-26 13:25
@greg The only edge-case I might see if making a linux-raid out of two/more hard disks before using the md as the disk to install on?

wdennis
2017-09-26 13:26
@lae is trying to do sub-templates?

wdennis
2017-09-26 13:28
It would be a great thing to have community-provided disk partitioning templates for preseed/kickstart

greg
2017-09-26 13:28
sub-templates work.

greg
2017-09-26 13:29
@lae is trying to generate parameterized named templates

wdennis
2017-09-26 13:39
@greg Sorry for my ignorance, but is that an entire ?top-level? template (like the standard ?net_seed.tmpl?) or a sub-template that would nest inside the top-level template?

greg
2017-09-26 13:40
golang text templates allow for templates in templates.

greg
2017-09-26 13:41
DRP originally (3.0.1) didn?t. We then added it to like 3.0.3 or 3.0.4.

greg
2017-09-26 13:41
This allows templates to include templates (recursively).

greg
2017-09-26 13:42
v3.1.0 adds tasks and stages which an alternative to some templates in templates.

greg
2017-09-26 13:42
Kickstarts/preseeds (the non-script parts) are perfect places for sub-templates. The command parts can be do better with tasks/stages now.

wdennis
2017-09-26 13:43
Hopefully we can learn about the new tasks/stages during the community call today

shane
2017-09-26 13:43
@wdennis - we'll be talking about the new v3.1.0 features :slightly_smiling_face:

shane
2017-09-26 13:44


wdennis
2017-09-26 13:45
Yes, parameter-driven sub-templates would be great for partitioning schemes - a collection of such templates could be community (or DR) provided, and less wheels reinvented

wdennis
2017-09-26 13:45
@shane, Thanks! will check this stuff out before the call

zehicle
2017-09-26 14:23
@wdennis the content system is also a key feature for 3.1 to help those reuse thoughts

wdennis
2017-09-26 17:57
@shane Meetup soon, right?

spector
2017-09-26 18:11

wdennis
2017-09-26 19:07
Nice work @shane :slightly_smiling_face:

shane
2017-09-26 19:07
Thx @wdennis - pleasure to "virtually meet" you :slightly_smiling_face:

shane
2017-09-26 19:08
- for those of you couldn't attend the meetup - we'll post the video shortly and provide links here ... keep an eye out for additional announcements on scheduling and agenda items for v002 meetup

lae
2017-09-26 20:09
@greg @wdennis that's exactly what I'm trying to do - have separate partitioning templates based on what workload is required the "edge-case" of making linux raid is well, an understatement. there are lots of possible disk layouts, including with LVM (such as making a layout that follows CIS benchmarks (https://www.cisecurity.org/cis-benchmarks/)) - I was originally doing the same thing back when I was using cobbler.

lae
2017-09-26 20:11
Anyway, so I tried what @greg provided but I get this: ``` [lae@yuzu provision-content]$ drpcli templates upload templates/labs-seed.tmpl as labs-seed.tmpl Error: Parse error for template labs-seed.tmpl: template: labs-seed.tmpl:65: unexpected "(" in template clause ```

2017-09-26 20:29
@zehicle Can't seem to find your slack invite. My gmail or uber account?

zehicle
2017-09-26 20:31
@stanchan checking...

zehicle
2017-09-26 20:32
gmail

wdennis
2017-09-26 20:45
@lae Will have to let @greg respond, not a golang guy myself, so not going to try to t?shoot the syntax? But, powerful idea, and looking forward to the solution :slightly_smiling_face:

wdennis
2017-09-26 20:45
My idea would be to have a library of disk partitioning templates that one could choose from for a given deployment

wdennis
2017-09-26 20:47
Most of my use cases use a single disk (could be a RAID virtual disk al la Dell PERC) with partitions for swap and then LVM PV, as is the case with default net_seed template

2017-09-26 20:47
Time to feed the :bear:!

wdennis
2017-09-26 20:47
But every once in a while, I get a more custom request, which I have to handle when I provision the ?n? systems of that type

wdennis
2017-09-26 20:49
@ use should be using the DR bear icon! :stuck_out_tongue:

shane
2017-09-26 20:49
@wdennis - we have a hardware RAID partitioning solution that will be released soon - that piece is a paid-for-content piece, as we've put a LOT of development and testing work in to it ... but it's designed to do a large number of flexible hw raid setups

wdennis
2017-09-26 20:50
@shane sounds very cool, but really thinking more about flexibility of Linux partitioning of one disk

shane
2017-09-26 20:51
are you using ubuntu or centos based distros ?

wdennis
2017-09-26 20:51
yes :stuck_out_tongue:

wdennis
2017-09-26 20:52
Actually mostly Ubuntu these days

shane
2017-09-26 20:52
well - as bad as 'debian installer' is - the 'partman' pieces do have some pretty flexible capabilities - albeit - not very intuitive .... and those could be pretty easily pushed in to a flexible set of templates

wdennis
2017-09-26 20:52
But have folks who occasionally spec CentOS

lae
2017-09-26 20:52
yeah, I'm pretty familiar with d-i's partman now

wdennis
2017-09-26 20:53
Yes, I wish d-i was as easy as kickstart?

shane
2017-09-26 20:53
@lae ... my condolences ... :slightly_smiling_face:

lae
2017-09-26 20:54
so like, I'd prefer to be able to use my own existing seeds with those layouts but at the moment I can only really do a copy of a seed per layout...which is gonna be a pain to manage if I brought every layout I used

lae
2017-09-26 20:54
hence, trying to parametrize the partitioning scheme template

wdennis
2017-09-26 20:54
I?d pay for a sane disk partitioning UI (like the Ubuntu installer?s for instance) that generates correct d-i partman lines?

lae
2017-09-26 20:54
disk partitioning UI sounds great

lae
2017-09-26 20:55
lot of work though, don't think that's what rackn worked on

lae
2017-09-26 20:55
(or maybe it is)

wdennis
2017-09-26 20:55
Once you have the correct d-i in a library of configs, it should be easy enough (?) to modify

wdennis
2017-09-26 20:55
Sounds like a community-contrib thing?

wdennis
2017-09-26 20:57
Would be great to have a way to have something like ?Ansible Galaxy? for DR community-provided resources

wdennis
2017-09-26 20:57
Exposed thru the DRP UX :slightly_smiling_face:

shane
2017-09-26 20:58
not UI pieces - but templates/profiles to and a plugin to drive the raid configs

wdennis
2017-09-26 21:01
What I was trying to say was if there is community-provided content (such as templates, content packs, plugins, etc.) There could be a way to browse them thru the UX (kind of like what?s available in GitHub Atom editor in the way of Packagaes / Themes)

wdennis
2017-09-26 21:02
It would be up to the end-user whether to trust them and utilize them or not (perhaps with a rating system?)

shane
2017-09-26 21:02
@wdennis - we have content browsing already in the UI ...

shane
2017-09-26 21:03
:slightly_smiling_face:

wdennis
2017-09-26 21:03
@shane Need to learn more about that :wink:

shane
2017-09-26 21:03
if you go to the Content page - the right side panel labeled "organization content" is a library of available content to pull in to your local Endpoint

wdennis
2017-09-26 21:03
Not too familiar yet about Content packs, Plugins, Stages, Tasks, Jobs


shane
2017-09-26 21:04
in that view, the middle panel is content I've added; the right panel is the available content

shane
2017-09-26 21:05
I added the content from the content library - but I did do it via the CLI - not UI ... could just as easily have clicked on the UI "Transfer" link

shane
2017-09-26 21:06
the content also is versioned - as you can see if something has an upgrade available - and you can choose to Upgrade your content pack for what you currently have installed

shane
2017-09-26 21:06
in that screen shot - the drp-community-content has an upgrade available

wdennis
2017-09-26 21:06
Yes, I see it in my DRP UI - all the available content right now is RackN-provided?

shane
2017-09-26 21:06
"drp-community-content" is the ... ahem ... Community Content :slightly_smiling_face: freely available

wdennis
2017-09-26 21:07
lol

shane
2017-09-26 21:07
some of that content you see is going to shuffle over to the "free-for-register" content

shane
2017-09-26 21:07
like I mentioned - UX is still Tech Preview ... and we're baking a few of the last bits

wdennis
2017-09-26 21:07
About registering - can self-create a RackN beta account, or how to do so?

shane
2017-09-26 21:08
@zehicle and team did an amazing job pulling all of that together

wdennis
2017-09-26 21:08
n.m - found ?sign up? link

shane
2017-09-26 21:08
Upper right - click on "RackN Login" and then sign up

wdennis
2017-09-26 21:16
OK, did signup

wdennis
2017-09-26 21:16
Went to Content page and tried to d/l update for community content, got error...


shane
2017-09-26 21:18
looks like you tried to add content that already exists

wdennis
2017-09-26 21:18
But anyways, get the idea...

shane
2017-09-26 21:18
"ce-root-access" already exists - you'd have to destroy it first, then re-create

carl
2017-09-26 21:22
So, `explode_iso.sh` is failing for every ISO I have uploaded with an exit code of 255

carl
2017-09-26 21:22
any ideas?

shane
2017-09-26 21:24
@wdennis - you have an older version of "drp-community-content" (note the long winded name w/ spaces)

shane
2017-09-26 21:24
you probably tried adding the newer pack "drp-community-content" - they're same things - just renamed

shane
2017-09-26 21:24
that's a bug I ran in to - and we got cleaned up a bit

shane
2017-09-26 21:25
you should be able to nuke the "Digital Rebar Provision Community Content" pack then Transfer the "drp-community-content" pack

wdennis
2017-09-26 21:25
Ok

shane
2017-09-26 21:25
Remember the golden rule: UX is Tech Preview still

shane
2017-09-26 21:25
:slightly_smiling_face:

wdennis
2017-09-26 21:29
And a damn fine tech preview it is

shane
2017-09-26 21:36
@carl - can you check the tftpboot/isos/ directory on your DRP Endpoint - the ISO gets staged there with "drpcli bootenvs uploadiso FOO" - then explode_iso.sh runs on it

shane
2017-09-26 21:36
additionally - check the filesystem space of your tftpboot/isos/ directory

shane
2017-09-26 21:37
if you did "isolated" install - then it's going to be in $HOME/drp-data/ location - if you did "production" install - it should be in /var/lib/dr-provision/tftpboot (for 3.1) or /var/lib/tftpboot (for 3.0.x)

greg
2017-09-26 21:51
@lae - I?ll try and look at it tonight.

greg
2017-09-26 21:51
I probably messed something up.

lae
2017-09-26 22:03
also thanks for the stickers! (they arrived today)

greg
2017-09-26 22:03
Nice!

spector
2017-09-26 22:10
For anyone who missed the meetup today, here is the https://youtu.be/LpVHYY9NdYo

lae
2017-09-26 23:36
hmm

lae
2017-09-26 23:36
I guess I could use ansible as another way to manage my templates

lae
2017-09-26 23:53
@lae uploaded a file: https://rackn.slack.com/files/U54E4SD4G/F79A7LEFL/image.png and commented: oh right - the machine overview is a little unwieldy when a lot of profiles exist

shane
2017-09-27 00:15
@lae - thx for the feedback on that - we know that page can get a bit crowded, and working on some ideas/thoughts around filtering and presentation of that information. If you have any ideas - feel free to share. :slightly_smiling_face:

shane
2017-09-27 00:21
- the v002 meetup details have been posted: meetup link: https://www.meetup.com/digitalrebar/events/243490128/ agenda document: https://docs.google.com/document/d/1FRFI-vONJY9yje9UsBqCI8XhojJ0XARsFgs4jbm-VRk also - vote now - if you'd like to see us move to a Weekly format with more demo/content versus the current published every-other-week format: meetup poll: https://www.meetup.com/digitalrebar/polls/1255504/


wdennis
2017-09-27 02:19
So, question about the RackN login on the DRP UI...

wdennis
2017-09-27 02:20
When I log in, it has my org as "Personal", no way to change that?

wdennis
2017-09-27 02:22
But bigger question - when I use the "sandwich" UI widget to open the left nav, and select Endpoints, I just see "127.0.0.1:8092" which of course is bogus, and there's no way to add actual DRP endpoints to that

shane
2017-09-27 02:23
"tech preview" :relaxed:

shane
2017-09-27 02:24
Multi-endpoint management not baked yet - simply change URL reference for now

wdennis
2017-09-27 02:24
Ah, got it

wdennis
2017-09-27 02:26
Maybe when I run into these problems, you can say I've been "TP'd" (tech previewed) :P

2017-09-27 02:26
Hi Gents. Update on yesterday - the issue is definitely the embedded dhcp server in dr-provision. I switched dhcp out to my firewall and enabled the pxeboot bits to pull off drp's tftp server and the sledgehammer env. booted fine, and now I have a machine registered.


zehicle
2017-09-27 02:31
@wdennis that is exactly the feature set that I'm working on right now... adding an endpoint exposed a validation loop. Once it's working, you'll be able to add/remove endpoints from your org. You'll also be able to shared orgs for multiple users. In the meantime, we can manually add endpoints for you on the backend so they correct on your list if you ask 1x1

wdennis
2017-09-27 02:37
@zehicle Cool, will PM details

greg
2017-09-27 03:02
@nzsouthernman_twitter - we should talk about it some more. We need to fix that. My guess is that the subnet isn?t quite right.

2017-09-27 03:03
Is there anything I can provide from my end to help? My working firewall is pfSense, and I just reloaded dr-provision with --disable-dhcp

greg
2017-09-27 03:04
The subnet will still be there

greg
2017-09-27 03:04
IT would be nice to see the subnet object and what the network specs are. Is DRP directly attached to the network in question?

2017-09-27 03:05
I first had it working by disabling the subnet in the UX entirely, then enabled fw dhcp server, entered the right next server & lpxelinux.o file and hey presto!

2017-09-27 03:05
Yes, drp is directly on the network. Network is an esxi 6.5 vswitch.

2017-09-27 03:05
{ "ActiveEnd": "192.168.1.200", "ActiveLeaseTime": 60, "ActiveStart": "192.168.1.100", "Available": true, "Enabled": false, "Errors": [], "Name": "ens192", "NextServer": "192.168.1.10", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.255.0" }, { "Code": 3, "Value": "192.168.1.1" }, { "Code": 6, "Value": "192.168.1.1" }, { "Code": 15, "Value": "burnside.school.nz" }, { "Code": 28, "Value": "192.168.1.255" }, { "Code": 66, "Value": "192.168.1.10" }, { "Code": 67, "Value": "lpxelinux.0" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "192.168.1.0/24", "Validated": true }

greg
2017-09-27 03:06
Okay - so - it sounds like it might be subnet config, but it also could be the vswitch. I seem to recall that the vswitch can ?help? by not letting broadcasts through in some cases and some settings.

2017-09-27 03:07
The test client is a vm with one vnic, on the drp vswitch. The drp host has one nic, on the drp vswitch. The pfsense fw has two nics, one on drp vswitch and one on our core lan.

greg
2017-09-27 03:07
The subnet looks okay - 66 isn?t required. the server fills it in.

2017-09-27 03:08
(all three machines are client vm's on the same esxi host)

2017-09-27 03:09
But you're probably right, it's most likely b/c drp is sitting on esxi. Ubuntu 16.04.3 with open-vm-tools installed, so the vnic should be 10gigE.

greg
2017-09-27 03:10
hmmm - A couple of other things you can try. If you want, the subnet looks okay. DRP has some preferences that control debugging. You can change the debugDhcp preference to 2 (through the cli) or HIGH in the global setup UI page. This will cause the DHCP server part of DRP to dump packet content.

greg
2017-09-27 03:10
That way we can see if DRP is getting and responding to DHCP messages.

greg
2017-09-27 03:11
Well, if we are in an Ubuntu vm, it should be okay.

greg
2017-09-27 03:11
Now, ubuntu firewall rules could get in the way. Not sure what you have there.

2017-09-27 03:11
It is indeed dishing out dhcp leases - I had them coming through yesterday ok.

2017-09-27 03:12
firewall should be disabled - that was one of the install instructions I followed yesterday when setting this all up.

greg
2017-09-27 03:12
okay - cool and good.

2017-09-27 03:13
dr-provision2017/09/27 02:29:48.674105 Received DHCP packet: type Discover xid 0x4f61e751 ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 02:29:48.674494 Received DHCP packet: type Request xid 0x4f61e751 ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 02:29:48.674718 xid 0x4f61e751: 192.168.1.104 is no longer able to be leased: No lease for 192.168.1.104, covered by subnet 192.168.1.0 These are from before I --disable-dhcp

2017-09-27 03:13
When I fired it up it cleaned out yesterday's leases

2017-09-27 03:14
ooh, ubuntu firewall may have been running... give me a sec...

greg
2017-09-27 03:15
also apparmor can do funky things to DHCP servers (though mostly keep them from starting).

2017-09-27 03:16
dr-provision2017/09/27 03:16:12.941340 Received DHCP packet: type Discover xid 0x579e6ecb ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 03:16:15.028534 Received DHCP packet: type Discover xid 0x589e6ecb ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 03:16:19.093171 Received DHCP packet: type Discover xid 0x599e6ecb ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 03:16:27.167050 Received DHCP packet: type Discover xid 0x5a9e6ecb ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dang.

2017-09-27 03:17
No dice. However under Subnets/Leases I see a brand new lease listed.

greg
2017-09-27 03:17
Make sure you enable the subnet if you are testing it

2017-09-27 03:18
dr-provision2017/09/27 03:17:45.724011 Received DHCP packet: type Discover xid 0x599e6ecb ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 03:17:45.724446 Subnet ens192: handing out existing lease for 192.168.1.101 to MAC:00:50:56:9e:6e:cb dr-provision2017/09/27 03:17:53.798103 Received DHCP packet: type Discover xid 0x5a9e6ecb ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:50:56:9e:6e:cb dr-provision2017/09/27 03:17:53.798614 Subnet ens192: handing out existing lease for 192.168.1.101 to MAC:00:50:56:9e:6e:cb Good catch Greg.

2017-09-27 03:18
But no pxeboot. :(

greg
2017-09-27 03:18
Yeah - your node isn?t accepting the DHCP reply for some reason, it appears.

2017-09-27 03:19
No matter - handing the pxeboot off to the firewall works for me while I'm fiddling around.

greg
2017-09-27 03:19
You should see an ACK and Request message.

2017-09-27 03:19
Have to head home now, will have another crack getting further tomorrow.

greg
2017-09-27 03:19
Make sure that the otehr DHCP servers are off as well in the future (just in case).

greg
2017-09-27 03:20
cool - though DRP is designed to work in the pfSense case as well.

wdennis
2017-09-27 03:20
I?m using pfSense as the DHCP server, handing off PXE boot to DRP

greg
2017-09-27 16:05
@lae - it turns out that `template` doesn?t do variable expansion on its parameters.

greg
2017-09-27 16:05
So, I wrote my own.

greg
2017-09-27 16:05
This will be in the next release. I?m putting in tip now.

greg
2017-09-27 16:05
This can be dangerous.

greg
2017-09-27 16:06
We don?t prevent loops

monkey
2017-09-27 16:46
yo - wow im on slack with you still

monkey
2017-09-27 16:46
did any check show up or do i need to call the bank :slightly_smiling_face:

zehicle
2017-09-27 17:51
@monkey moving this to 1x1

lae
2017-09-27 20:56
@greg are you basically just defining your own implementation for `template` on rendering the template? (just so I understand what's going on in that commit)

greg
2017-09-27 21:08
Yeah - the golang txt template call doesn?t know about templates to recurse (it seems like it should, but it doesn?t). So, that is what implemented.

greg
2017-09-27 21:08
@lae - I was waiting for the build to finish.

greg
2017-09-27 21:08
but here we go.

greg
2017-09-27 21:09
Tip is updated to have CallTemplate , which works like template but can take parameters.

greg
2017-09-27 21:10
This is ?dangerous? and probably when text template doesn;t really support it. You can create infinite loops. I?m not opposed to this. One day I may get around to writing a loop detector, but it isn?t there now.

greg
2017-09-27 21:10
Here is an example content file.

greg
2017-09-27 21:11
@greg uploaded a file: https://rackn.slack.com/files/U02DGQYK1/F7B17T1AB/template-test.yml and commented: Example CallTemplate content

greg
2017-09-27 21:11
To import it, you will need tip installed.

greg
2017-09-27 21:11
```drpcli contents create - < template-test.yml```

greg
2017-09-27 21:11
With this in place, I do this:

greg
2017-09-27 21:12
```drpcli machines create '{ "Name": "fred", "Address": "1.1.1.1", "Stage": "template-viewer" }'```

greg
2017-09-27 21:13
This creates a fake machine with stage set to `template-viewer` which implicitly sets the bootenv to `template-viewer`

greg
2017-09-27 21:14
```drpcli machines list Name=fred```

greg
2017-09-27 21:14
Get the UUID and do this:


greg
2017-09-27 21:22
should get this: ``` This is a test template. We are doing parameterized injection of templates. This is the empty template with no pills added. ```

greg
2017-09-27 21:23
```drpcli machines set f17a55fe-c415-4f38-88dd-6fdfb2ce0c8d param template-selector to blue-pill```

greg
2017-09-27 21:23
will get this:

greg
2017-09-27 21:23
``` This is a test template. We are doing parameterized injection of templates. This is the blue-pill. ```

greg
2017-09-27 21:23
setting `template-selector` to `red-pill` will get something different

greg
2017-09-27 21:24
You can probably guess.

greg
2017-09-27 21:25
Setting it to `green-pill` will show template in template expansion.

greg
2017-09-27 21:25
``` This is a test template. We are doing parameterized injection of templates. This is the green-pill. Is both pills at once. This is the blue-pill. This is the red-pill. ```

greg
2017-09-27 21:26
Setting a second variable `bonus-pill`

greg
2017-09-27 21:26
```drpcli machines set f17a55fe-c415-4f38-88dd-6fdfb2ce0c8d param template-bonus-pill to bonus-pill.tmpl```

greg
2017-09-27 21:26
Will import a template from within a template dynamically.

greg
2017-09-27 21:26
`bonus-pill` is a full template name.

greg
2017-09-27 21:28
setting `bonus-pill` to `centos-7.ks.tmpl` will embed the kickstart file in test template

greg
2017-09-27 21:28
This is how I often test my templates quickly.

greg
2017-09-27 21:29
@lae @shane hope that helps. :slightly_smiling_face:

shane
2017-09-27 21:31
@greg - that's awesome - nice job whipping this up so quickly ... hoping it solves @lae request

greg
2017-09-27 21:31
The pattern works for tasks too. Actually, it is really useful for tasks because tasks don?t have a direct render point like bootenvs do.

shane
2017-09-27 21:31
though I'm concerned .... you select "red" pill and "blue" pill and combined pill is "green". Shouldn't it be "purple" ??? :stuck_out_tongue_winking_eye:

greg
2017-09-27 21:32
That isn?t how light sabers work.

greg
2017-09-27 21:32
And yes the hate is flowing.

greg
2017-09-27 21:32
:slightly_smiling_face:

shane
2017-09-27 21:32
:slightly_smiling_face:

greg
2017-09-27 21:33
I had this earlier, but was waiting on the build to finish. Had to work around sourceforge being partially down.

lae
2017-09-27 21:45
lol

lae
2017-09-27 21:46
ok - just updated my deployment to tip (yay ansible) - will try `CallTemplate` in a bit

lae
2017-09-27 21:59
(it works)

lae
2017-09-27 22:00
\o/

greg
2017-09-27 22:12
:slightly_smiling_face:

zehicle
2017-09-27 22:38
if I'm reading this thread - we just added dynamic workflow to DRP. I thought v3.2 was more of a cleanup release :stuck_out_tongue:

zehicle
2017-09-27 22:38
very exciting stuff

stanchan740
2017-09-28 01:49
has joined #json

wdennis
2017-09-29 18:00
@shane / @greg - would it be possible to create a video based on the already-published vid of Terraform/DRP integration, showing creation of the stages / workflows / parameters used to do the install at Packet?

shane
2017-09-29 18:01
hey @wdennis - you can see the drpcli calls I made in that video in the github content - it's all posted there


shane
2017-09-29 18:02
the "demo-run.sh" is just the front-end driver to the process which controls the bin/control.sh script - that script makes all of the direct calls to do the work

shane
2017-09-29 18:02
we have the "5min-drp" video posted in youtube - but it doesn't directly show the drpcli commands

wdennis
2017-09-29 18:02
It's not the 5-min DRP install one, but the Terraform integration one @greg did the demo on

wdennis
2017-09-29 18:05
https://youtu.be/5bxcpmxQXx4 - interested in how the stuff used at around 5:00 in the vid was created

shane
2017-09-29 18:06
stage maps ?

shane
2017-09-29 18:06
those are plumbed in via the 5min-drp cli calls too

wdennis
2017-09-29 18:07
Well, how to create the stages 1st, then tie them together in maps (workflows?)

shane
2017-09-29 18:08
sure - the JSON blob that's used to inject the stages ... sort of shows it ... we're working on getting better doc on the 3.1 features on the website around that

shane
2017-09-29 18:08
you can also piece stages together in the UI ... so for first time work - it's a good process to follow

shane
2017-09-29 18:08
then you can pull the JSON blob that gets built from the UI

wdennis
2017-09-29 18:09
I'll review the 5-min vid in the meantime


wdennis
2017-09-29 20:47
problem - restarted DRP (isolated mode) and am getting a TLS Exception error when trying UI...


greg
2017-09-29 20:50
reclick it- your certs should be generated.

greg
2017-09-29 20:50
could have been

wdennis
2017-09-29 20:50
No good

wdennis
2017-09-29 20:50
Still getting error

greg
2017-09-29 20:51
Hard reset the browser window? Not sure.


wdennis
2017-09-29 20:55
Did restart browser - still getting the error...

greg
2017-09-29 20:55
Ddi you click to https://192.168.1.148

wdennis
2017-09-29 20:55
Yes

wdennis
2017-09-29 20:58
How to "Accept Certificate"?

greg
2017-09-29 20:58
advanced button usually.

greg
2017-09-29 20:58
The browser should warn saying self-signed cert.

wdennis
2017-09-29 21:00
If I go to the DRP server's URL, it just flips me back to the RackN UX


greg
2017-09-29 21:02
Then you?ve accepted the cert

wdennis
2017-09-29 21:02
The DRP server software gens a SSL cert?

greg
2017-09-29 21:03
yes to do ssl

wdennis
2017-09-29 21:03
Where located?

greg
2017-09-29 21:03
in the directory where you started drp

greg
2017-09-29 21:03
server.crt and server.key

wdennis
2017-09-29 21:04
I see 'server.crt' and 'server.key' in my base DRP directory, but they have old dates...

greg
2017-09-29 21:05
we will reuse them.

greg
2017-09-29 21:05
It is most likely something else.

greg
2017-09-29 21:05
does the cli work?

wdennis
2017-09-29 21:06
Yes, drpcli works


greg
2017-09-29 21:08
You closed the browser and repoened it.

greg
2017-09-29 21:08
Log out from the Saas Part (in the upper right corner) and refresh everything.

wdennis
2017-09-29 21:09
Yup, then even rebooted browser host system

wdennis
2017-09-29 21:10
Ok, signed out then back in to RackN UX site

wdennis
2017-09-29 21:10
Now will use sandwich menu and select my endpoint

wdennis
2017-09-29 21:11
Still getting TLS Exception

greg
2017-09-29 21:11
just use the DRP endpoint redirect. I?m worried that some cookies are something are getting in the way.

greg
2017-09-29 21:11
Because it may be that you need to enter https://

greg
2017-09-29 21:11
it is supposed to add it, but it may not be working.

wdennis
2017-09-29 21:12
Did that, no good still

greg
2017-09-29 21:12
That is why I want you to log out of the RackN Portal. Then use the DRP redirect to see if you can access the DRP pages.

wdennis
2017-09-29 21:13
I did

greg
2017-09-29 21:13
okay

wdennis
2017-09-29 21:13
Trying another client system now

wdennis
2017-09-29 21:15
Port 8092 right?

greg
2017-09-29 21:15
yes

wdennis
2017-09-29 21:16
Browser can't connect on this one...

greg
2017-09-29 21:17
firewall rules?

wdennis
2017-09-29 21:18
Oh goddamn it

wdennis
2017-09-29 21:18
YES

wdennis
2017-09-29 21:18
Stupid docker

wdennis
2017-09-29 21:22
Note to self: find way of perma-clearing docker-created iptables rules on my DRP host

wdennis
2017-09-29 21:23
So, that's weird that the UX was throwing a TLS error when really it couldn't connect to the endpoint...

greg
2017-09-29 21:24
apparently. It is the normal cause for failure.

greg
2017-09-29 21:24
I think it is a guess and points to the most likely failure.

wdennis
2017-09-29 21:30
'systemctl stop firewalld; systemctl mask firewalld' FTW

lae
2017-09-30 20:55
or just, `firewall-cmd --zone=public --addport=8092/tcp --permanent && firewall-cmd --reload`

lae
2017-09-30 20:55
anyway, I just happened to stumble onto this, happy birthday to rackn's twitter account

lae
2017-09-30 20:55
or well whoever's birthday the rackn's twitter account is set to

zehicle
2017-10-01 03:46
Thanks! Tomorrow is our official 3 year anniversary

zehicle
2017-10-01 03:46
we just updated the Ansible integration to make it easier. Demo is of Kubernetes deploy. https://youtu.be/b5himGQ1Zew

wdennis
2017-10-02 16:46
@greg If one has requests for future functionality, open an issue on GitHub?

shane
2017-10-02 16:48
@wdennis - yep - just open a new "issue" and `Label` it appropriately (eg "enhancement", "bug", etc) https://github.com/digitalrebar/provision/issues/new

wdennis
2017-10-02 16:51
I don't think this functionality exists, but - is there any flags / tags / indicators that a given node is undergoing a DRP reinstall (workflow), and/or event generated when the node re-registers with DRP and changes the bootparam to "local" when finished?

wdennis
2017-10-02 16:53
Use case is that I'm doing reinstalls to remote nodes where there is no remote console access, and I'd like confirmation that the node is undergoing reinstall by DRP (confirm the PXE boot and installer kickoff) and would like a notification when the process completes

wdennis
2017-10-02 17:04
@shane ^^^ do you know?

shane
2017-10-02 17:06
are you just looking for a visual clue ?

shane
2017-10-02 17:07
the UI shows the current BootEnv on the Machines page - you can also get this via the drpcli machines command for the given machine you want to reference

shane
2017-10-02 17:07
you can also use the events from the websockets - in the API this is the "Announce" bullhorn icon in upper left corner

praful
2017-10-02 17:17
has joined #json

shane
2017-10-02 17:18
welcome @praful

praful
2017-10-02 17:19
Thanks

shane
2017-10-02 17:26
@wdennis - you can get additional logging information from the `dr-provision` binary itself via a couple of ways (which would include the API calls you might be interested in): * in production mode - systemd logging of events will catch the API calls * in isolated mode - you have to redirect stdout to a file (as opposed to running `dr-provision` in foreground * the slack plugin can also catch the websocket events - and you can push them to specific slack channel

shane
2017-10-02 17:37
@wdennis - if you wanted to watch a given machine - you can use the 'drpcli machines wait ... ` drpcli to "wait" until a given field to change to a given value ... for example - if you wanted to watch for a "BootEnv" change to "local" (meaning it transitioned from a previous BootEnv), do something like: ```drp machines wait <machine_uuid> BootEnv local```

zehicle
2017-10-02 17:50
@wdennis that was the goal w/ the overview page w/ stages. That's what stages do. it's also subscribed to the websocket, so no refresh required

zehicle
2017-10-02 17:51
you're looking for a machine stage going back to "complete" or "complete-nowait"

zehicle
2017-10-02 17:51
I think we've been demoing this function 1x1 but there's no video yet

wdennis
2017-10-02 18:41
@shane Yes, I've been refreshing the machines page in the UI and waiting for the bootenv to change to 'local'...

wdennis
2017-10-02 18:42
But when I was doing the IPMI reboot, I was wishing for some visual indication that the node had started the PXE/reinstall process

wdennis
2017-10-02 18:43
(Anotherwords, had contacted the DRP server and PXE-booted from it)

wdennis
2017-10-02 18:43
Hard to know which no console access what is going on...

wdennis
2017-10-02 18:46
@zehicle I am very interested in docs / vid howtos on the new workflow / stages / tasks / jobs UX functionality

zehicle
2017-10-02 18:59
the terraform demo shows stages - not the primary part of the demo, but you'll see it there

zehicle
2017-10-02 18:59
it's on my short list to do

shane
2017-10-02 19:00
stages are also shown (again - indirectly) in the 5min-drp video as well

zehicle
2017-10-02 19:00
:slightly_smiling_face:

zehicle
2017-10-03 02:02
@wdennis I did a quick video (ok, it ended up being 20 minutes) of using the stages workflow.

zehicle
2017-10-03 02:02

wdennis
2017-10-03 02:09
Thanks, @zehicle !

zehicle
2017-10-03 02:11
the workflow requires the RackN stages/bootenvs because those include tasks.

wdennis
2017-10-03 02:12
BTW, on rebar.digital site, the Starting > Documentation (icon) > Documentation (link) is broken - goes to a readthedocs "pages does not exist yet" page...

wdennis
2017-10-03 02:12
And PDF docs is v3.0.1

wdennis
2017-10-03 02:41
So was trying to pick up the RackN "os-discovery" content pack, but when I try to transfer it, am getting this error:


shane
2017-10-03 02:42
```New layer violates key restrictions: keysCannotBeOverridden: sledgehammer is already in layer 0 keysCannotOverride: sledgehammer would be overridden by layer 0 keysCannotBeOverridden: discovery is already in layer 0 keysCannotOverride: discovery would be overridden by layer 0```

shane
2017-10-03 02:42
yep

wdennis
2017-10-03 02:42
I believe it's b/c I upgraded from v3.0 and already had s'hammer

shane
2017-10-03 02:42
it's a conflict with multiple content packs providing some of the same content types

wdennis
2017-10-03 02:43
Trying to follow @zehicle 's new vid, and I think I need that to get proper stages

shane
2017-10-03 02:43
yes - you do need it for stages

wdennis
2017-10-03 02:44
So, do I need to delete something(s) so I then can get the content pack?

shane
2017-10-03 02:45
I believe you have to destroy the existing content that conflicts - but this brings up a larger issue we need to sort out internally -- with the content packs and conflicting names

zehicle
2017-10-03 02:47
if you have pre-existing items with overlapping names, the content will not install.

zehicle
2017-10-03 02:47
unless it from the same content

wdennis
2017-10-03 02:48
I've been running DRP since v3.0.1 and have upgraded my way to v3.1

wdennis
2017-10-03 02:49
So any way to resolve at present?

wdennis
2017-10-03 02:51
Or, I only have two custom profiles that I use - can I export them, nuke my current DRP isolated tree, and re-install then import my profiles?

zehicle
2017-10-03 02:51
assuming you are taking a backup before trying anything....

zehicle
2017-10-03 02:52
you may be able to just delete the conflicting items

wdennis
2017-10-03 02:52
Oh sure

zehicle
2017-10-03 02:52
which are from community content anyway

wdennis
2017-10-03 02:55
Is it the s'hammer ISO and the discovery bootenv?

zehicle
2017-10-03 02:55
ISO should be fine. it's the bootenvs

zehicle
2017-10-03 02:55
discovery & sledgehammer. It's in the message you posted

wdennis
2017-10-03 02:56
Ah, I see there's a s'hammer bootenv as well

wdennis
2017-10-03 02:56
Ok

zehicle
2017-10-03 02:56
need to get that content preview page working.

wdennis
2017-10-03 02:57
The UX is really coming along nicely :)

wdennis
2017-10-03 02:57
It's fun to log in and see changes every day

zehicle
2017-10-03 03:01
thanks!

wdennis
2017-10-03 03:05
Hmmm, looks like trying to delete 'discovery' and 'sledgehammer' bootenvs from UX not working...



zehicle
2017-10-03 03:06
do you have machines using those bootenvs?

wdennis
2017-10-03 03:06
Ah, yes I do...

wdennis
2017-10-03 03:06
(No error thrown?)

zehicle
2017-10-03 03:07
it may eat the error incorrectly.

zehicle
2017-10-03 03:07
that would be a bug to log.

wdennis
2017-10-03 03:07
No wait, they are all right now set to 'local'

wdennis
2017-10-03 03:08
But the defaults in system prefs used them

wdennis
2017-10-03 03:09
I've changed this to other bootenv's temp, let's see now...

wdennis
2017-10-03 03:09
Yup, that was it

wdennis
2017-10-03 03:10
Ok, cool, got the updated ones from the content pack now

wdennis
2017-10-03 03:11
Along with the stages

wdennis
2017-10-03 03:13
Hmmm, but now the new bootenvs can't be selected in the Global Setup system pref's...

shane
2017-10-03 03:14
I believe you need to make sure the BootEnvs are fully functional first - eg make sure the ISOs are loaded

shane
2017-10-03 03:15
go to the BootEnvs page and make sure each bootenv is "check" (ok) - not X (bad)

wdennis
2017-10-03 03:15
Ah, they are "X"'d

wdennis
2017-10-03 03:17
So, have to move the sledgehammer-___.tar file into place in the file system?

shane
2017-10-03 03:17
if you're sure it's the latest - you can move it to the tftpboot/isos/ directory

shane
2017-10-03 03:17
or use the `drpcli bootenvs uploadiso sledgehammer` command

wdennis
2017-10-03 03:18
That will get it from the 'net?

shane
2017-10-03 03:18
that'll pull a fresh/latest copy from the rackn repo

shane
2017-10-03 03:19
yes - you can run it on your laptop to pull-from-net-to-your-laptop-then-push-to-your-endpoint

shane
2017-10-03 03:19
or you can run direct from endpoint if your DRP endpoint has inter-tubes access

wdennis
2017-10-03 03:19
It does, and am doing it now...

shane
2017-10-03 03:20
same thing for any of the other BootEnvs you may need/want to pull in - for example `drpcli bootenvs uploadiso centos-7.3.1611-install` (note the lack of the `ce-` prefix - making this a RackN distributed ISO image)

wdennis
2017-10-03 03:21
Yes, :+1::skin-tone-2:

wdennis
2017-10-04 01:23
DR folk, if I scp a ISO into tftpboot/isos/ then the DRP system will pick up on it and mark the bootenv as :white_check_mark:

wdennis
2017-10-04 01:24
??

wdennis
2017-10-04 01:24
My `drpcli bootenvs uploadiso ...` is timing out

shane
2017-10-04 01:27
do you have the endpoint/username/password set to point to your DRP Endpoint correctly ?

shane
2017-10-04 01:29
...but to answer your question ... yes, wget/curl/scp/rsync/whatever an ISO to the `tftpboot/isos/`

wdennis
2017-10-04 01:30
I mean it works (I see the bits being pulled if I do an `iftop`) but it eventually times out

shane
2017-10-04 01:30
directory - then kill w/ HUP signal the DRP server (eg `killall -s HUP dr-provision`)

shane
2017-10-04 01:30
this will force it to re-read directories and explode the iso

shane
2017-10-04 01:30
IPtables rules ... :wink:

wdennis
2017-10-04 01:31
It fails with the message `Error: Error uploading <foo>.iso: context deadline exceeded`

shane
2017-10-04 01:32
we have seen that - and it's an incorrect timing bug - if the DRP endpoint is just a tiny bit slow responding

shane
2017-10-04 01:32
@greg is aware of this one

wdennis
2017-10-04 01:32
It did it multiple times yesterday eve trying to get the ubuntu 16.04 ISO

shane
2017-10-04 01:33
you can avoid this by running the drpcli command directly on your Endpoint, or downloading the ISO via the URL and dropping it (with correct name) in to the tftpboot/isos/ directory

wdennis
2017-10-04 01:34
I did run the drpcli on the endpoint?

shane
2017-10-04 01:34
DRP does need a kick in the pants to re-read the isos directory to recognize the ISO showing up (eg the HUP signal)

shane
2017-10-04 01:34
ok - only solution I have for you right now is the DL into the isos directory - that will work

wdennis
2017-10-04 01:35
SO the correct path (on isolated) is `./drp-data/tftpboot/isos` correct?

shane
2017-10-04 01:35
yep exactly

wdennis
2017-10-04 01:35
Cool

shane
2017-10-04 01:36
you can pull the URL path from the bootenv spec if you need it

shane
2017-10-04 01:37
for example `drpcli bootenvs show ubuntu-16.04-install | jq '.OS.IsoUrl'`

shane
2017-10-04 01:37
```[root@5min-drp-ewr1-00 isos]# drpcli bootenvs show ubuntu-16.04-install | jq '.OS.IsoUrl' "http://mirrors.kernel.org/ubuntu-releases/16.04/ubuntu-16.04.3-server-amd64.iso"```

shane
2017-10-04 01:38
or .... `wget $(drpcli bootenvs show ubuntu-16.04-install | jq -r '.OS.IsoUrl') && killall -s HUP dr-provision`

wdennis
2017-10-04 01:38
nice :slightly_smiling_face:

wdennis
2017-10-04 01:38
UNIX FTW

shane
2017-10-04 01:38
hell yeah baby !!

shane
2017-10-04 01:39
if you're watching the dr-provision output, you'll see something like: ```dr-provision2017/10/04 01:39:02.048432 Reloading data stores... dr-provision2017/10/04 01:39:17.612217 Reload Complete```

shane
2017-10-04 01:39
the first step takes a short bit as it explodes the ISO and stages the bits

wdennis
2017-10-04 01:42
Just +1'd https://github.com/digitalrebar/provision/issues/437 ? need to clean out the older Ubuntu 16.04.2 ISO + tree now that DRP has moved up to using 16.04.3

shane
2017-10-04 01:43
oye! ...recognize that one... :slightly_smiling_face:

wdennis
2017-10-04 01:44
@shane Any idea if DRP will eventually do some sort of node hardware inventory via sledgehammer? (I think DRv2 did this)

shane
2017-10-04 01:44
what's your use case for that ? the short answer is quite probably ... but what exactly do you mean ?

shane
2017-10-04 01:44
inventory out to ... some 3rd party CMS/CMDB ?

wdennis
2017-10-04 01:46
I?d love it if, for instance, you have a pool of available hw in DRP that you could pass hardware requirements for a deployment, such as minimum node memory, # vCPUs, availability of GPUs, ?

wdennis
2017-10-04 01:47
I think of the ?facts? that something like Ansible `setup` module returns (more than DRP needs, but has most of the relevant hw details)

wdennis
2017-10-04 01:47
If they could be set as node properties

shane
2017-10-04 01:48
we have some nascent ansible inventory export capabilities that @zehicle has been momming along

wdennis
2017-10-04 01:48
Yes, have see that, very cool

wdennis
2017-10-04 01:49
But what I?m thinking of I don?t want to tie into Ansible (or Puppet/Chef/whatever)

shane
2017-10-04 01:49
this can also be done in a roll-your-own fashion by applying parameters to `machines` and then you can build your own DevOps/CfgMgmt tooling that can query the DRP API to get a list of ready-state machines ... but right now that is definitely possible only with a bit of hand polishing

shane
2017-10-04 01:50
(amending that to read: "ready-state machines with specific parameters ... ")

wdennis
2017-10-04 01:50
It would be cool if the post-install routine of s?hammer that registers the node in DRP could also set some parameters as to hardware characteristics

shane
2017-10-04 01:52
definitely possible with a little polish on your own with use of stages - you could write your own `stage` that calls a `task` to register a set of `parameters` back to DRP ... that's the nice thing about the stages/stage map solution

shane
2017-10-04 01:53
it's relatively easy to inject new steps in the process

wdennis
2017-10-04 01:53
Hmmm? interesting

shane
2017-10-04 01:53
the `task` would simply collect inventory for you, presumably right after the `discover` stage

shane
2017-10-04 01:54
it could report back to an External Node Classifier (ENC), or to register specific params to DRP (essentially making it a poor-mans ENC)

shane
2017-10-04 01:54
after discovery - you can do a wait .... or you could do a burn-in workload .... or you could do a full install ... just depends how you want to piece together the stage map

shane
2017-10-04 01:55
for your process and workflow

shane
2017-10-04 01:56
I'd suggest that an ENC is a better thing to use for collecting inventory and asset data for re-use, but once a `machine` is registered, there's nothing stopping you from applying `parameters` to that machine which future jobs can pick up on and utilize

shane
2017-10-04 01:57
it's certainly a valid use case to place discovered machines in to some sort of "ready-state" that shows they've been discovered, verified, and classified some how

wdennis
2017-10-04 01:57
ENC is a new concept to me ? found http://reclass.pantsfullofunix.net which is an interesting example which I shall explore?

shane
2017-10-04 01:57
yeah - reclass is really cool - but unfortunately, I think it's fallen out of maintenance in the last 2 years or so ...

shane
2017-10-04 01:58
I've pinged Martin (the author) of that recently to get a feel for the status of it, but so far, haven't gotten a response from him

shane
2017-10-04 01:58
though that was only a few days ago .... :slightly_smiling_face:

shane
2017-10-04 01:59
reclass is interesting because there are integrations for ansible/puppet/saltstack making it relatively ubuquitous

wdennis
2017-10-04 02:00
Yeah, I see what you mean from the GH insights graphs

wdennis
2017-10-04 02:09
OK, that worked :slightly_smiling_face:

shane
2017-10-04 02:09
woot !!

shane
2017-10-04 02:10
@wdennis...gotta roll, the Mrs. is home and we're going to go grab some chow - any other burning issues ?

wdennis
2017-10-04 02:10
Got the DR-provided `os-discovery` and `os-linux` content packs transferred and all objects :white_check_mark:

shane
2017-10-04 02:10
awesome

wdennis
2017-10-04 02:10
No, thx again for your assistance!

shane
2017-10-04 02:11
no prob - feel free to ping us if anything else comes up ... I'll take a look at it when I get back

wdennis
2017-10-04 02:11
Kewl - enjoy dinner

wdennis
2017-10-04 03:15
FYI, brought up a new DRP endpoint on Packet; transferred content packs, did `drpcli bootenvs uploadiso [discovery,ubuntu-16.04-install,centos-7.3.1611-install]`, all of the bootenvs uploaded were :white_check_mark:, but when I checked Stages, the stages depending on those bootenvs were still :x:

wdennis
2017-10-04 03:16
Had to HUP dr-provision to get them to go :white_check_mark:

shane
2017-10-04 03:58
:slightly_smiling_face: yep - stages has a small bug; the state of the bootenv/tasks/etc. they rely on do not get re-evaluated correctly ... so you have to HUP, or remove content/re-install content after installing the stages - known issue, and on the radar to get fixed ASAP

shane
2017-10-04 03:58
I believe @greg is on this one too

shane
2017-10-04 04:01
@wdennis - just checked with our feature plans; and it looks like we have inventory parameter feedback on the roadmap for pushing inventory items back during discover or similar stage. If there are specific elements you are looking for that would be useful, please share them in an enhancement ticket so we can consider capturing those as enhancements for 3.2 or 3.3

wdennis
2017-10-04 04:05
Will do

wdennis
2017-10-04 04:06
Replicated @zehicle?s Workflow vid using my Packet account, pretty damn cool?

shane
2017-10-04 04:06
yep - packet is nifty

wdennis
2017-10-04 04:07
Jealous of their bare metal speeds :flushed:

shane
2017-10-04 04:07
yeah, they have some really nice SSD hardware in them - I know the type2's are running the Samsung PRO models - super fast

wdennis
2017-10-04 04:09
I will open an enhancement issue on GH for the inventory parameter stuff - was thinking CPU type / family, RAM amount, NIC speed, GPU presence

greg
2017-10-04 04:18
Nic speed is a little tricky. Usually a range. Usually list max capable? Similar with GPU because of driver support. Lspci or lshw can report them. I?m still thinking

shane
2017-10-04 04:20
```[root@5min-drp-ewr1-00 tftpboot]# ethtool enp0s20f0 | grep Speed: Speed: 2500Mb/s```

shane
2017-10-04 04:21
`ethtool` will almost always reliably return the actual link speed, along with other things like supported speeds

greg
2017-10-04 04:21
I?ve found that isn?t always good because it relies on phy state

greg
2017-10-04 04:22
Better than nothing

shane
2017-10-04 04:22
true - but ...

shane
2017-10-04 04:22
exactly what I was going to say

greg
2017-10-04 04:24
In dr we had a ruby wrapper on some ioctls that got the max bits from the driver. It was fine until we got 40gb adapters and blew our code up. Moving targets and all that

shane
2017-10-04 04:25
...and it looks like http://packet.net is doing something interesting with BW on the NIC on the switch side to set it at 2.5 Gbps

shane
2017-10-04 04:25
maybe a single 10 GBps NIC shared via 4 nodes ? something goofy with the Atom hardware

shane
2017-10-04 04:25
(type0)

greg
2017-10-04 04:26
Yeah. Shared nic backplane?

shane
2017-10-04 04:26
something like that

zehicle
2017-10-04 20:29
ALL - we're officially starting the Beta program around the RackN UX and advanced content. If you've registered for the RackN site (it's NOT required to use the UX) then you're in the Beta.

zehicle
2017-10-04 20:30
We could use help spreading the word about the project and what we're doing to advance physical ops. If you are inclined, please tag @digitalrebar on social media and say good things about the project. It will help us build a community and sustain the project.

zehicle
2017-10-04 20:30
Thanks!

chermack
2017-10-04 21:13
advanced content or advanced packages?

zehicle
2017-10-04 21:17
Same thing.

wdennis
2017-10-05 03:27
@zehicle done

greg
2017-10-05 15:46
@wdennis - how do you check for and enumerate GPUs?

wdennis
2017-10-05 16:01
@greg Primitive, but? `lspci -Q | grep -i vga` then eyeball output?

wdennis
2017-10-05 16:01
example: ```root@ml17-pc04:~# lspci -Q | grep -i vga 05:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) 06:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1) 09:00.0 VGA compatible controller: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] (rev a1)```

wdennis
2017-10-05 16:04
Another example (different GPUs): ```root@snake06:~# lspci -Q | grep -i vga 01:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1) 02:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1) 03:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1) 04:00.0 VGA compatible controller: NVIDIA Corporation GM200 [GeForce GTX TITAN X] (rev a1)```

greg
2017-10-05 16:05
okay - figured something like that but was wondering to be sure.

wdennis
2017-10-05 16:06
Of course, that still works on a system without GPUs? example: ```[root@ml53 ~]# lspci -Q | grep -i vga 06:01.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200eW WPCM450 (rev 0a)```

greg
2017-10-05 16:09
Yeah - but that may be sufficient as a hint and info. It might even be drivable.

greg
2017-10-05 16:09
Cool thanks

wdennis
2017-10-05 16:11
For NVIDIA GPUs only: ```root@ml17-pc04:~# nvidia-smi -L GPU 0: GeForce GTX 1080 Ti (UUID: GPU-04875823-43f4-f49e-2f5f-f6027ab4cabf) GPU 1: GeForce GTX 1080 Ti (UUID: GPU-451a1806-3b88-8ea4-73ba-6bffe562ade0) GPU 2: GeForce GTX 1080 Ti (UUID: GPU-aa755a59-d4f2-b8e6-124a-6a0f6a2cb5ed)```

wdennis
2017-10-05 16:13
of course, would not work on AMD / Intel? (we only use NVIDIA for GPU computing?)

greg
2017-10-05 16:15
sure - interesting - I?m hoping for more generic, but useful to know

greg
2017-10-05 17:45
Hi - updated tip content to include a new sledgehammer with some additional tools for disk manipulation. GPT partitions and the like. If you import new content, you will need to update sledgehammer. ```drpcli bootenvs uploadiso ce-discovery``` would fix it.

carl
2017-10-05 18:42
I still can't get sledgehammer to expand on `v3.1.0-0-b70cf8ee1f61844a6d64070a8b272c2bec512204`: looks like `explode_iso.sh` is still unhappy. I'm running on CentOS 7.4 and here is the error from the UI: ```Command output: Explode iso sledgehammer/b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273 /var/lib/dr-provision/tftpboot /var/lib/dr-provision/tftpboot/isos/sledgehammer-b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273.tar /var/lib/dr-provision/tftpboot/sledgehammer/b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273 Extracting /var/lib/dr-provision/tftpboot/isos/sledgehammer-b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273.tar for sledgehammer/b3c09ebd5a9c228c66d8a617b6f5d10ccbe1c273 vmlinuz0: OK stage1.img: OK stage2.img: OK /usr/sbin/selinuxenabled``` selinux is in permssive mode

carl
2017-10-05 18:47
The ce-ubuntu-16.04-install and the ce-centos-7.3.1611-install images worked fine after a SIGHUP to dr-provision those have a checkmark on boot environments. cd-sledgehammer and ce-discovery both still have exes.

shane
2017-10-05 18:48
hi carlp - I'm looking in to this right now

vlowther
2017-10-05 20:25
@carl based on that output from explode_iso for installing Sledgehammer, it looks like restorecon failed for some reason.

vlowther
2017-10-05 20:25
Can you run it against /var/lib/dr-provision/tftpboot and see what happend?

carl
2017-10-05 20:32
```[kumulus@koiab dr-provision-install]$ sudo restorecon /var/lib/dr-provision/tftpboot [sudo] password for kumulus: [kumulus@koiab dr-provision-install]$ echo $? 0 [kumulus@koiab dr-provision-install]$```

vlowther
2017-10-05 20:32
ok, that is as expected.

vlowther
2017-10-05 20:33
What do the last few lines of your explode_iso.sh look like?

vlowther
2017-10-05 20:33
(should be /var/lib/dr-provision/tftpboot/explode_iso.sh

carl
2017-10-05 20:34
```printf '%s' "$expected_sha" > "${os_install_dir}.extracting/.${os_name}.rebar_canary" [[ -d "${os_install_dir}" ]] && mv "${os_install_dir}" "${os_install_dir}.deleting" mv "${os_install_dir}.extracting" "${os_install_dir}" rm -rf "${os_install_dir}.deleting" if which selinuxenabled && selinuxenabled; then restorecon -R -F "$tftpboot" fi ```

carl
2017-10-05 20:37
Interestingly: ```[kumulus@koiab dr-provision-install]$ sudo selinuxenabled ; echo $? 0```

vlowther
2017-10-05 20:39
ya, even in permissive mode selinuxenabled will return 0

vlowther
2017-10-05 20:39
It only returns failure when selinux is disabled.

vlowther
2017-10-05 20:40
ya, you are affected by a bug that @shane fixed yesterday.

vlowther
2017-10-05 20:41
my amazing typo skills wrote $tftpboot instead of $tftproot :slightly_smiling_face:

vlowther
2017-10-05 20:44
The latest tip release should fix that issue.

carl
2017-10-05 22:25
awesome, that fixed it. Thanks!

shane
2017-10-05 23:01
We hope to see all of you at the next Meetup, scheduled for Tuesday October 10th at 11am PST. Agenda Doc: https://docs.google.com/document/d/1FRFI-vONJY9yje9UsBqCI8XhojJ0XARsFgs4jbm-VRk Also - vote if you'd like to see the meetup move to a weekly cadence: Poll: https://www.meetup.com/digitalrebar/polls/1255504/ Meetup page: https://www.meetup.com/digitalrebar/


lae
2017-10-06 11:30
the event log on the DRP web UI seems to be overflowing below the browser

lae
2017-10-06 11:31

greg
2017-10-06 13:22
I?ll add an issue for it. Thanks!

carl
2017-10-06 17:13
New day, new problems. I can't seem to get the default CentOS 7 install to work. Machine downloads the kernel and initrd and then seems to reboot

carl
2017-10-06 17:13
The default Ubuntu install seems to work

carl
2017-10-06 17:17
Nevermind - seems to be one particular machine just doesn't like me.

shane
2017-10-06 17:41
ok ... we aren't minding! :slightly_smiling_face: let us know if you need any help with anything

mfischer
2017-10-08 04:26
has joined #json

mfischer
2017-10-08 21:56
I seem to be missing a step in the directions. Did the quickstart but I'm not sure what I need to do in order to get my master node to reply to pxe requests. I get file not found

mfischer
2017-10-08 22:00
aha there's no default.ipxe installed in tftpdir

mfischer
2017-10-08 22:11
hmm discover-load.sh seems to be gone

greg
2017-10-08 22:42
Yeah - follow the output of install.sh.

greg
2017-10-08 22:42
Also, for pxe requests, make sure you set the defaultUknownBootEnv to discovery or ce-discovery.

greg
2017-10-08 22:42
This is what servers default.ipxe.

greg
2017-10-08 22:42
@mfischer - forgot to tag at the start.

mfischer
2017-10-08 22:43
I figured the part out about not ignoring after I posted the question

mfischer
2017-10-08 22:43
I think its doing something, but with packet who knows. I'l messing around with that kernel param

greg
2017-10-08 22:43
We have some doc updates coming. Just got ahead of ourselves.

greg
2017-10-08 22:43
Ah - okay . A couple of things.

greg
2017-10-08 22:44
Sign up for an RackN account (if yuo haven?t already.) Then you can get packet content and packet-ipmi.

mfischer
2017-10-08 22:44
yep I'm just now going to go play with packet IPMI so I dont have to use packet commands

mfischer
2017-10-08 22:44
is it a plugin?

greg
2017-10-08 22:44
Adding these and setting stage workflow will put the right kernel params in place.

greg
2017-10-08 22:44
It is two pieces.

greg
2017-10-08 22:45
packet content - this adds some tasks and stages.

mfischer
2017-10-08 22:45
I have that

greg
2017-10-08 22:45
packet ipmi plugin - this adds actions to ?packet discovered? nodes so that the system can issue reboot and on/off calls.

greg
2017-10-08 22:45
Okay - so you probably want to use a discover stage as the default stage.

mfischer
2017-10-08 22:45
when I click Add pluging its just spinning...

greg
2017-10-08 22:46
hmm it can take a little bit, but not too long.

mfischer
2017-10-08 22:46
default stage = packet-discover

greg
2017-10-08 22:46
let me check qucik

mfischer
2017-10-08 22:46
unknown default = discover

mfischer
2017-10-08 22:47
ok I'll give it 2-3 min

greg
2017-10-08 22:48
okay - packet-discover can not be the initial stage. I should probably change that.

mfischer
2017-10-08 22:48
ok will change, still no joy on loading plugins

greg
2017-10-08 22:48
You need to change discover to packet-discover.

greg
2017-10-08 22:48
If possible can you check the console of your browser and see if there is an error.

mfischer
2017-10-08 22:48
wait do you mean change p-discover to discover?

greg
2017-10-08 22:49
It worked for me, but we keep tweaking the saas interface.

greg
2017-10-08 22:49
yeah

greg
2017-10-08 22:50
Okay - this is what I usually do for playing with packet.

greg
2017-10-08 22:51
1. unknownbootenv -> discovery

greg
2017-10-08 22:51
2. knownbootenv -> sledgehammer

mfischer
2017-10-08 22:51
I dont see any errors after clicking ad in the console

greg
2017-10-08 22:51
3. defaultStage -> discover

greg
2017-10-08 22:51
4. In the workflow pane, I set the following relations on the global profile.

greg
2017-10-08 22:51
a. discover -> packet-discover:Success

greg
2017-10-08 22:52
b. packet-discover->terraform-ready:success

greg
2017-10-08 22:52
c. centos-7.3.1611-install->packet-ssh-keys:success

greg
2017-10-08 22:52
d. packet-ssh-keys->complete-no-wait:success

mfischer
2017-10-08 22:52
let me install terraform too

greg
2017-10-08 22:53
I find terraform-ready to be a nice intermediate wait state.

greg
2017-10-08 22:53
You can ?auto? install things by changing the terraform-ready-state to `centos-7.3.1611-install:Reboot`

mfischer
2017-10-08 22:53
there's also plugin providers which shows packet IPMI

greg
2017-10-08 22:53
oh - so it seems to work, but you didn?t get UI confirmation.

mfischer
2017-10-08 22:53
clicking Transfer doesnt do anything

greg
2017-10-08 22:54
I also do in workflow.

mfischer
2017-10-08 22:54
shoudl I have some logs in my dr-p server?

greg
2017-10-08 22:54
e. ubuntu-16.05-install -> packet-ssh-keys:success

greg
2017-10-08 22:54
You may have some jobs.

greg
2017-10-08 22:55
jobs will accrue logs.

greg
2017-10-08 22:55
If you install drp in production mode, the drp?s log is in systemd/journalctl

mfischer
2017-10-08 22:55
yeah, I get it in stdout now

greg
2017-10-08 22:56
Now for IPMI packet to work, you will need to enable IPMI packet by creating a plugin.

greg
2017-10-08 22:56
You can click plugins, add button. It will ask which provider, choose packet-ipmi.

mfischer
2017-10-08 22:56
you know logging out and back in loses your Endpoint?

greg
2017-10-08 22:56
That will ask you for your API key. Set that in the parameter and click add.

greg
2017-10-08 22:57
okay - good to know. @zehicle is working on that.

mfischer
2017-10-08 22:57
I think I need to work out why I can't add any plugin providers

greg
2017-10-08 22:57
I think it should be getting saved on the org/user in the ?cloud?, but we?ve had some issues.

greg
2017-10-08 22:58
I thought you said you had one. I?m sorry.

mfischer
2017-10-08 22:58
they're available but not installed

mfischer
2017-10-08 22:58
wait nm

greg
2017-10-08 22:58
When all else fails, I hit the cli.


mfischer
2017-10-08 22:59
they're installed I think

mfischer
2017-10-08 22:59
unsure what transfer does, clicking it doesnt do much

greg
2017-10-08 22:59
```drpcli plugin_providers list```

greg
2017-10-08 22:59
ok - checking on my side.

mfischer
2017-10-08 22:59
I did top of tree

mfischer
2017-10-08 23:00
list is []

mfischer
2017-10-08 23:00
let me redo this w/o top of tree after a bio break

greg
2017-10-08 23:00
I see what is happening.

greg
2017-10-08 23:02
@zehicle - it looks like `http://rackn.github.io` and ` https://qww9e4paf1.execute-api.us-west-2.amazonaws.com/main/catalog/plugins/` are having cross-site scripting wars again.

greg
2017-10-08 23:02
@mfischer - here; I?m going to attach it here. Take the binary and drop it in drp-data/plugins directory. ok?

mfischer
2017-10-08 23:02
ok or just post a link

mfischer
2017-10-08 23:02
and I 'll curl it

mfischer
2017-10-08 23:02
wget I mena

greg
2017-10-08 23:03
okay - let me see.

mfischer
2017-10-08 23:03
pasting it works too

greg
2017-10-08 23:03
linux?

mfischer
2017-10-08 23:03
my master node is ubuntu


greg
2017-10-08 23:04
wait

greg
2017-10-08 23:04
ugh - wrong one.


greg
2017-10-08 23:04
almost the same, but this is the latest.

mfischer
2017-10-08 23:04
lol Fox just started a ticker for our Winter storm warning

greg
2017-10-08 23:05
where are you?

greg
2017-10-08 23:05
what does that mean, like 40s?

greg
2017-10-08 23:05
or is it real.

mfischer
2017-10-08 23:05
no 3-6" of snow and its real, i'm in colorado

mfischer
2017-10-08 23:05
beautiful now

greg
2017-10-08 23:05
wow - nice!

greg
2017-10-08 23:05
Just got home from daughter?s soccer game in the 90s

mfischer
2017-10-08 23:06
ok thats installed

mfischer
2017-10-08 23:06
./drpcli plugin_providers list --> []

mfischer
2017-10-08 23:06
chmod a+x?

greg
2017-10-08 23:06
yes

mfischer
2017-10-08 23:06
yep that fixed it

greg
2017-10-08 23:06
the upload path does that for you.

greg
2017-10-08 23:06
okay cool

greg
2017-10-08 23:07
now you can add the plugin

mfischer
2017-10-08 23:07
BTW the UX in here said "Choose undefined"

mfischer
2017-10-08 23:07
in the last field

greg
2017-10-08 23:08
yeah that is okay.

greg
2017-10-08 23:08
It should have a packet-api-key field first.

mfischer
2017-10-08 23:08
yep I added it

greg
2017-10-08 23:09
ok - good . now. Did you do the workflow changes.

mfischer
2017-10-08 23:09
need to go back and finish

greg
2017-10-08 23:09
ok cool

mfischer
2017-10-08 23:09
oh wait I need terraform

mfischer
2017-10-08 23:10
where does that come from again?

mfischer
2017-10-08 23:10
nm found it

mfischer
2017-10-08 23:10
costs me $$$

greg
2017-10-08 23:11
does it?

mfischer
2017-10-08 23:12
well it says $1 but you dont have my CC so ...

mfischer
2017-10-08 23:13
but maybe for real users

greg
2017-10-08 23:13
let me check - I can?t remember what we decided

mfischer
2017-10-08 23:13
os-other costs me $2

mfischer
2017-10-08 23:16
I still have a hard time with the machine menu, there's no reboot option for example

mfischer
2017-10-08 23:16
you can force and mark it runnable but its not the same

greg
2017-10-08 23:17
Yeah - those are support costs. Though at the current moment we should talk because I suspect that group licenses and support

greg
2017-10-08 23:17
Yeah you need to get the machine to have packet-uuid.

greg
2017-10-08 23:17
Set the machines stage to discover and manually reboot it.

mfischer
2017-10-08 23:17
rob said on the podcast the base costs are $1/node/mo

mfischer
2017-10-08 23:17
ok

greg
2017-10-08 23:19
For most parts - we are trying to figure out how to charge this sane. I think ipmi baremetal support is that way. We have ala carte prices listed. Bundling will be the better case most likely. Also, we?d love to know what your are trying to do . :slightly_smiling_face:

greg
2017-10-08 23:19
We can also take this to PM if need.

mfischer
2017-10-08 23:21
I wanted to try out some of the features beyond basic bare metal mgmt

mfischer
2017-10-08 23:22
no specific projects yet

zehicle
2017-10-08 23:23
@mfischer no CC required - billing is on committed use. We're not putting a paywall up yet, if you are using it, esp beta, then figure out the license $. $1/node/month is the base for support

mfischer
2017-10-08 23:25
yeah I didn't figure as much

mfischer
2017-10-08 23:25
at this point you'd have to hunt me down anyway since you don't have my CC :wink:

zehicle
2017-10-08 23:25
@greg I'll look in a minute and figure out.

greg
2017-10-08 23:25
np - I gave direct link and we are moved beyond.

mfischer
2017-10-08 23:26
is watching the packers/cowboys game while doing this

greg
2017-10-08 23:26
Yeah - me too.

mfischer
2017-10-08 23:27
I am not a cowboys fan

greg
2017-10-08 23:27
i am

mfischer
2017-10-08 23:27
I figured since you guys are in texas

greg
2017-10-08 23:27
all support is now cutoff

mfischer
2017-10-08 23:27
lol

greg
2017-10-08 23:27
j/k

mfischer
2017-10-08 23:27
HTTR

mfischer
2017-10-08 23:27
well now you can help again :disappointed:

greg
2017-10-08 23:27
lol - sigh

mfischer
2017-10-08 23:28
@greg my node made it to terraform-ready, I need to setup the ubuntu stage you mentioned

greg
2017-10-08 23:28
okay - so - did you upload the ubuntu iso already

mfischer
2017-10-08 23:28
yeah

greg
2017-10-08 23:29
okay - so os-linux content

greg
2017-10-08 23:29
needs to be loaded.

mfischer
2017-10-08 23:30
I think I loaded it, let me look

greg
2017-10-08 23:30
check stages

mfischer
2017-10-08 23:30
no, not there, let me add

mfischer
2017-10-08 23:31
should the WF be 100% connected? it seems like there's a hole

greg
2017-10-08 23:31
good catch

mfischer
2017-10-08 23:31
between terraform and an OS

greg
2017-10-08 23:31
That is dependent upon your choices.

mfischer
2017-10-08 23:32
rugby ending

greg
2017-10-08 23:32
Yeah -

greg
2017-10-08 23:33
Right now `terraform-ready` acts as a holding cell.

mfischer
2017-10-08 23:33
so, I deleted a stage that was in my WF and now WF won't load

greg
2017-10-08 23:33
You can then set the stage to which os you want and reboot the node.

greg
2017-10-08 23:33
awesome! or not.

greg
2017-10-08 23:34
```drpcli profiles show global```

mfischer
2017-10-08 23:34
can I just wipe the global?

mfischer
2017-10-08 23:34
destroy

greg
2017-10-08 23:34
please don?t do that.

mfischer
2017-10-08 23:35
done

mfischer
2017-10-08 23:35
oops

mfischer
2017-10-08 23:35
lol

greg
2017-10-08 23:35
lol

mfischer
2017-10-08 23:35
its fine now

greg
2017-10-08 23:35
```drpcli profiles create global```

mfischer
2017-10-08 23:35
just starting over

mfischer
2017-10-08 23:35
its back in the GUI

mfischer
2017-10-08 23:35
oh wait no its now

mfischer
2017-10-08 23:35
s/now/not/

greg
2017-10-08 23:35
Please put a global profile back. please.

mfischer
2017-10-08 23:35
yep done

greg
2017-10-08 23:36
I need to make that not delete able.

greg
2017-10-08 23:36
brb

mfischer
2017-10-08 23:36
hmmm GUI still wont load the workflow page, let me check console

mfischer
2017-10-08 23:37
jQuery.Deferred exception: Cannot read property 'change-stage/map' of null TypeError: Cannot read property 'change-stage/map' of null

mfischer
2017-10-08 23:37
I has an idea

greg
2017-10-08 23:38
```drpcli profiles set global param change-stage/map to '{}'```

mfischer
2017-10-08 23:38
I was going to throw in the old JSON values with the offending key removed

greg
2017-10-08 23:38
that works too

mfischer
2017-10-08 23:39
that fixed it ^

mfischer
2017-10-08 23:41
okay so last q for a bit

mfischer
2017-10-08 23:41
do unknown machines default in to the global work flow?

greg
2017-10-08 23:41
Yeah - so the flow is

greg
2017-10-08 23:41
unknown machine gets the unknown boot env.

greg
2017-10-08 23:43
discovery adds the node to the system and when the node is created, it gets the default stage and/or default known bootenv. The default stage of discover adds some tasks that start processing the change-map/stages. This will drive it through the stage chains which run the tasks.

greg
2017-10-08 23:43
Now, you asked about the ?hole? in the flow between `terraform-ready` and `centos-7.3.1611-install`.

greg
2017-10-08 23:43
This is a starting point to make sure flow is mostly right.

greg
2017-10-08 23:45
You could just as well change `packet-discover->terraform-ready:success` to `packet-discover->centos-7.3.1611-install:Reboot` and it will just flow straight through to an installed node.

greg
2017-10-08 23:45
Depends upon what your goal and usage is.

mfischer
2017-10-08 23:45
yep

greg
2017-10-08 23:45
In terraform-ready, the node can be grabbed by the terraform provider and driven through an OS install.

mfischer
2017-10-08 23:45
makes sense

greg
2017-10-08 23:47
The terraform content creates a single ?pool? like effect. The drp terraform provider uses API atomic ops to ensure that a single provider gets/reserves the machine and takes it out of the pool.

greg
2017-10-08 23:47
The model and flow could be extended to have named pools and other things. future enhancements and the like.

mfischer
2017-10-08 23:49
so minor q

greg
2017-10-08 23:49
okay - :slightly_smiling_face:

mfischer
2017-10-08 23:49
above you said Ubuntu:Reboot

mfischer
2017-10-08 23:49
doesnt it reboot after an OS install anyway?

greg
2017-10-08 23:50
okay so the reboot is the transition from the terraform-ready stage.

mfischer
2017-10-08 23:50
ah I see

greg
2017-10-08 23:50
The reboot is to escape the sledgehammer image and go to the os install image.

greg
2017-10-08 23:50
Yes, that is why the last stage `complete-no-wait:Success` doesn?t need a reboot.

greg
2017-10-08 23:50
The os install script finish and reboot the node.

greg
2017-10-08 23:51
If you want post-install actions that run in the newly booted OS, a new task/stage would need to be added to make sure that that that calls back into DRP would be run.

greg
2017-10-08 23:53
There are already examples of post-install actions that run in the install environment. ```packet-ssh-keys``` or ```ssh-access``` are examples of stages without boot environments that can be run in just about any bootenvironment to do their job.

greg
2017-10-08 23:53
So, you can chain them into the sequence to make sure keys are in place.

greg
2017-10-08 23:54
There is a lot of ?power? in the pieces as you start writing your own tasks and stages. Or leveraging our growing libraries of tasks/stages.

mfischer
2017-10-08 23:55
that makes sense

greg
2017-10-08 23:55
IPMI BMC configuration is already a stage/content that can be added to manage your BMCs on real hardware (setting users, configure ips, setting remote control).

greg
2017-10-08 23:56
Some that are close and should show up soon are inventory, classification, hw raid configuration, bios configuration, and component update.

mfischer
2017-10-08 23:56
I need to step away for a bit

mfischer
2017-10-08 23:56
back in 10

greg
2017-10-08 23:56
We are even toying around with image-based installs to skip the kickstart styles as an option. All things on road map.

greg
2017-10-08 23:57
Cool - I too am wondering off. I?l be around some.

zehicle
2017-10-09 00:04
I've duplicated the plugins issue... working on it

zehicle
2017-10-09 00:09
The plugin issue has been resolved - it was a backend issue - no updates required. I will patch the UX to make to protect in this case.

zehicle
2017-10-09 00:09
there is apparently another issue w/ the system...plugins. looking at that

zehicle
2017-10-09 00:14
I've found the issue w/ the providers too... will take more time to fix. Basically, it does not find them if you've already loaded them.

mfischer
2017-10-09 15:05
@greg do you have a recommended video/URL/etc that would explain more of the concepts to me? For example, stages/workflow/jobs/plugins etc.

mfischer
2017-10-09 15:10
I'm going to redo the setup but this time instead of hacking I'd like to get a working process down and feel like I'm lacking some of the concepts that might make me successful

greg
2017-10-09 15:26
We need to build some of those. We are going to be talking about them a little tomorrow on the community call. We have some plans to do those shortly. I need to do some videos about that.

mfischer
2017-10-09 15:27
ok thanks

mfischer
2017-10-09 15:27
I'm going to go back through some of the process again today and try to learn a bit more. I'm especially interested in some of the post install stuff you have like kubespray

mfischer
2017-10-09 16:41
@greg remind me what magic is needed to get packet IPMI working? I installed it and added my API key

greg
2017-10-09 16:42
That is all from the plugin side.

greg
2017-10-09 16:42
The content side needs to have the `packet-discover` stage chained after the `discover` stage.

mfischer
2017-10-09 16:42
ok

mfischer
2017-10-09 16:43
so packet discover is how rackn knows I have a http://packet.net box

greg
2017-10-09 16:43
yes - the tasks in the stage do two things

greg
2017-10-09 16:43
1. test to see if the machine is a packet machine and set the packet uuid as a parameter on the node.

greg
2017-10-09 16:43
2. put hte packet ssh keys into the discovery environment so that ssh access to the discovered nodes is allowed.

mfischer
2017-10-09 16:44
discover -> packet-disc:Success is what I will set

greg
2017-10-09 16:44
yes

greg
2017-10-09 16:44
then I do: packet-discover->terraform-ready:Success - this creates a hang env for a node.

mfischer
2017-10-09 16:45
I'm going to skip that and install ubuntu but concepts making more sense now

greg
2017-10-09 16:47
okay - cool

greg
2017-10-09 16:48
full disclosure, if you are watching the ssh console for a machine, you won?t see log output until the machine reboots again. The running of packet-discover adds the serial console parameters for boot environs. That doesn?t take effect until the next reboot.

wdennis
2017-10-09 16:58
@shane Upgraded the os-discovery Content, now see that sledgehammer needs upgrade - refresh my memory on how to upload?

greg
2017-10-09 17:02
```drpcli bootenvs uploadiso ce-sledgehammer```

greg
2017-10-09 17:02
is one of the ways.

greg
2017-10-09 17:02
UX can also do it.

wdennis
2017-10-09 17:11
Thx @greg

greg
2017-10-09 17:12
np

wdennis
2017-10-09 17:16
@greg - did param names change sometime lately?

greg
2017-10-09 17:16
a few, but you should have had it already.

wdennis
2017-10-09 17:17
For instance I have ?access_keys? type object, but I now see ?access-keys? type object...

greg
2017-10-09 17:17
yeah - we tried to normalize everything from `_` to `-`

greg
2017-10-09 17:17
we were all over the map.

wdennis
2017-10-09 17:18
Ah, OK - so should reset those...

greg
2017-10-09 17:18
yes

wdennis
2017-10-09 17:21
Ok. Also, a ?Profile? seems to not be just a collection of Params, but now is a class of machines, as used in Workflows?

greg
2017-10-09 17:22
I?ll have to handle describe in a bit. have to step away.

wdennis
2017-10-09 17:23
Ok

mfischer
2017-10-09 19:46
@greg when I apply a profile to a machine does it need to re-pxe to get it applied? for example kube-master

mfischer
2017-10-09 19:50
ah /me finds the docs

mfischer
2017-10-09 21:28
where do I report bugs?


mfischer
2017-10-09 21:29
I think this is a RackN UI issue specifically

shane
2017-10-09 21:29
please use the labels related to UX appropriately (enhancement or bug) :slightly_smiling_face:

mfischer
2017-10-09 21:30
ok

mfischer
2017-10-09 21:36
just need to figrue out how to label it

mfischer
2017-10-09 21:36
or maybe I cant?

shane
2017-10-09 21:37
when you create a new issues, there should be a "Label" pull down on the right - if not, you can go ahead and create it without a label, and apply label after you create

mfischer
2017-10-09 21:38
I think I need write access ... okay let me look again


mfischer
2017-10-09 21:38
I lack the cog wheel


shane
2017-10-09 21:38
hmm - will look in to that - plz go ahead an file issue, I'll label it

mfischer
2017-10-09 21:39
"Assign labels to issues and pull requests to help organize your projects. You can do this in repositories to which you have write access."

mfischer
2017-10-09 21:39
done


wdennis
2017-10-09 21:49
Running into a problem with my Ubuntu install...


wdennis
2017-10-09 21:50
How can I see generated preseed?

lae
2017-10-09 22:32

lae
2017-10-09 22:33
`{{.Machine.Path}}` will typically be http://$drphost:8091/machines/$uuid

lae
2017-10-09 22:34
so for that bootenv, the preseed can be fetched from e.g. http://provision.local:8091/machines/ffa946f1-d4fa-4b3d-a347-1e018904dd8e/seed

lae
2017-10-09 22:37
also I'd check the screen on alt+f8 (iirc) for any relevant error messages too

lae
2017-10-09 22:38
(or esc esc esc all the way until you can get back to the main menu prompt and open a shell/start a web server to view logs from somewhere else)

wdennis
2017-10-10 01:46
Thanks @lae

wdennis
2017-10-10 01:46
The URL seems to be: `url={{.Machine.Url}}/seed`

wdennis
2017-10-10 01:47
What does `{{.Machine.Url}}` translate into?

wdennis
2017-10-10 12:26
aha, figured it out? it?s: `http://<endpoint_ip>:8091/machines/<machine_uuid>/seed`

wdennis
2017-10-10 12:28
So I think I see the problem here?

wdennis
2017-10-10 12:28
`d-i partman-auto/disk string /dev//dev/sda`

wdennis
2017-10-10 12:30
I have the `operating-system-disk` param in the profile I?m using set to `/dev/sda` - maybe should just be `sda` now? Did that change somewhere along the way recently?

wdennis
2017-10-10 12:32
Yes, setting that value to just `sda` corrects the generated `partman-auto/disk` line

greg
2017-10-10 14:18
Yes - sigh - you are finding all the tweaks from 3.0.1 to 3.1.0 - most of those happened from v3.0.1 - > v3.0.3

greg
2017-10-10 14:20
I noticed that we were inconsistently using `operating-system-disk` across the different OSes . `sda` can work for both ubuntu and centos

wdennis
2017-10-10 16:38
@greg beta, baby! Bits in flight ;)

pton
2017-10-10 17:08
has joined #json

shane
2017-10-10 18:02
- meetup is starting now: https://zoom.us/j/3403934274

lae
2017-10-11 00:12
btw these are the changes I made in our content regarding debian/ubuntu, I guess some of this (particularly partitioning templates) might be useful upstream? I can open a PR containing some of these if you want https://gist.github.com/lae/1da54fd1abd2a56fa51f57fdd27de370

lae
2017-10-11 00:12
ugh wrong host

shane
2017-10-11 00:13
Hey @lae - we'd definitely be interested in incorporating some more capable partitioning template capabilities in to the Community Content - that's awesome - we'll look for the PR and review it with you - thanks !

lae
2017-10-11 00:14
ok, added gh link

shane
2017-10-11 00:14
(and any other enhancements you have :slightly_smiling_face: )

lae
2017-10-11 00:14
kk

2017-10-11 02:30
This message was deleted.

wdennis
2017-10-11 02:31
It puts the line `echo "PermitRootLogin yes" >> /etc/ssh/sshd_config` in the post-install.sh file

wdennis
2017-10-11 02:32
This does not work on Ubuntu (but I think it does on CentOS/RHEL)

shane
2017-10-11 02:32
why not on ubuntu ?

wdennis
2017-10-11 02:33
Ubuntu already has a line `PermitRootLogin without-password` in the distro-provided sshd_config file in

shane
2017-10-11 02:33
ah - that would be an idempotency fail then

wdennis
2017-10-11 02:34
So you end up with two PermitRootLogin stanzas

wdennis
2017-10-11 02:34
It only takes the first

wdennis
2017-10-11 02:35
Here?s the fix: `sed --in-place -r -e '/PermitRootLogin/ s/^#//' -e '/PermitRootLogin/ s/without-password/yes/' /etc/ssh/sshd_config`

wdennis
2017-10-11 02:36
That line works for both CentOS and Ubuntu

shane
2017-10-11 02:36
that won't work on Mac; (in place requires backup filename extension) - but something along those lines is exactly what I was just working on for you

wdennis
2017-10-11 02:36
Apple knows Best(tm)

shane
2017-10-11 02:36
Apple knows BSD ... which ... well ... never mind ...

wdennis
2017-10-11 02:37
I swear I had Greg change this in v3.0.x ?

wdennis
2017-10-11 02:37
(ran into the same problem)

greg
2017-10-11 02:38
in this case, the line would work but needs to be parameterized. Since those script really on run the linux context, the mac issue is less problematic.

shane
2017-10-11 02:39
also fails with two existing PermitRootLogin entries already

greg
2017-10-11 02:39
@wdennis - I think I wanted to, but didn?t get to it. We should open an issue to make sure it gets in.

shane
2017-10-11 02:39
or a commented out #PermitRootLogin and a valid entry

wdennis
2017-10-11 02:39
I think the stock CentOS sshd_config has the line `#PermitRootLogin yes` and Ubuntu has `PermitRootLogin without-password`

shane
2017-10-11 02:39
```vagrant@drp:/tmp$ cat foobar #PermitRootLogin PermitRootLogin without-password vagrant@drp:/tmp$ cat foobar | sed -r -e '/PermitRootLogin/ s/^#//' -e '/PermitRootLogin/ s/without-password/yes/' PermitRootLogin PermitRootLogin yes```

greg
2017-10-11 02:40
yeah - @shane - really need to cut out all PermitRootLogin and put one in place.

zehicle
2017-10-11 02:40
That's not a valid value for that param

wdennis
2017-10-11 02:41
I have been using that value for a while with v3.0.x

wdennis
2017-10-11 02:41
When did it change?

greg
2017-10-11 02:41
Yes is a valid value.

zehicle
2017-10-11 02:41
Sorry... listen to Greg and Shane. I didn't have context

wdennis
2017-10-11 02:42
n/p

greg
2017-10-11 02:42
The valid ones are: ?without-password|yes|no|forced-commands-only?

greg
2017-10-11 02:42
just not root

zehicle
2017-10-11 02:43
Oh!! The ONE bad choice and that's what I picked. Sigh. That's beyond luck

shane
2017-10-11 02:43
oh god no - not stupid threading in slack ...

wdennis
2017-10-11 02:43
lol

greg
2017-10-11 02:43
yeah :wink:

wdennis
2017-10-11 02:44
2 - 2 - 2 conversations in one!

wdennis
2017-10-11 02:44
I know this was working back in v3.0 for me?.

wdennis
2017-10-11 02:46
I know we?re bad for wanting root logins with passwords in SSH, but? ?that?s the way it?s always been?

lae
2017-10-11 02:51
I uh, actually use this in our post-install.sh template: ``` grep -q '^PasswordAuthentication ' /etc/ssh/sshd_config && sed -r -i 's/^(PasswordAuthentication).*/\1 no/' /etc/ssh/sshd_config || (echo 'PasswordAuthentication no' >> /etc/ssh/sshd_config) grep -q '^PermitEmptyPasswords ' /etc/ssh/sshd_config && sed -r -i 's/^(PermitEmptyPasswords).*/\1 no/' /etc/ssh/sshd_config || (echo 'PermitEmptyPasswords no' >> /etc/ssh/sshd_config) grep -q '^PubkeyAuthentication ' /etc/ssh/sshd_config && sed -r -i 's/^(PubkeyAuthentication).*/\1 yes/' /etc/ssh/sshd_config || (echo 'PubkeyAuthentication yes' >> /etc/ssh/sshd_config) grep -q '^PermitRootLogin ' /etc/ssh/sshd_config && sed -r -i 's/^(PermitRootLogin).*/\1 no/' /etc/ssh/sshd_config || (echo 'PermitRootLogin no' >> /etc/ssh/sshd_config) ```

shane
2017-10-11 03:03
`sed -i.bak '/^PermitRootLogin /{h;s/ .*/ yes/};${x;/^$/{s//PermitRootLogin yes/;H};x}' /etc/ssh/sshd_config`

wdennis
2017-10-11 03:05
@greg I feel stupid opening an issue for this, but? here ya go: https://github.com/digitalrebar/provision/issues/480

shane
2017-10-11 03:09
my sed scriptlet doesn't address duplicate lines - but in sshd_config context - it would be valid to simply chuck a `| sort -u` after the sed runs

shane
2017-10-11 03:10
@wdennis and @lae - I'll address fixing/patching for both of your issues/suggestions (respectively) tmw morning ... going to have an evening with my family now ... :slightly_smiling_face:

wdennis
2017-10-11 03:11
I?m out too, nite!

lae
2017-10-11 03:13
@shane night! however I will mention that a `sort -u`, while probably not an issue in this case, would break any sshd_config files that have `Match` lines

greg
2017-10-11 03:13
Yeah

greg
2017-10-11 03:14
@lae adds some additional parameters for the sshd file.

greg
2017-10-11 03:14
We should also add a fail on the task-based versions if sshd doesn?t start.

2017-10-11 12:44
hi there! can i ask some question as user, not dev? I try to use rebar in the 'home lab' and something wrong with arp tables

2017-10-11 12:44
https://pastebin.com/HWfinPLW

shane
2017-10-11 13:17
@kolomnitcki - you are definitely welcome to ask questions here as an operator - DRP is FOR operators. :slightly_smiling_face: What specifically is the issue you are seeing ?

2017-10-11 13:24
@rackneng well, i'm using a router as DHCP server with settings `next-server=192.168.10.14` where 192.168.10.14 is rebar vm ip address and `boot-file-name=lpxelinux.0`

2017-10-11 13:25
in `drpcli leases list` you can see two IP on one MAC `"00:0c:29:a4:f2:e1"`

2017-10-11 13:26
``` "Addr": "192.168.10.10", "Token": "00:0c:29:a4:f2:e1", ... "Addr": "192.168.10.11", "Token": "00:0c:29:a4:f2:e1", ```

2017-10-11 13:28
but in fact `00:0c:29:a4:f2:e1` has `192.168.10.16` IP

shane
2017-10-11 13:30
Can you please provide the start up options for "dr-provision", and output of "drpcli subnets list"

2017-10-11 13:34
`drpcli subnets list` - https://pastebin.com/JrSJbjMh (its ok what i'm using pastebin?)

shane
2017-10-11 13:35
sure no problem

shane
2017-10-11 13:36
and also "ps -ef | grep dr-provision", please

2017-10-11 13:37
``` [root@rebar ~]# ps -ef | grep dr-provision root 1279 1 0 Oct10 ? 00:01:16 /usr/local/bin/dr-provision ``` i'm using dr-provision.service

shane
2017-10-11 13:39
Ah - so you have not disabled the DHCP service within `dr-provision` - so you have 2 DHCP servers handing out leases - that will cause lots of grief

shane
2017-10-11 13:41
you need to disable the DHCP service within `dr-provision` - please add `--disable-dhcp` to your service start

shane
2017-10-11 13:42
(that should be in `/etc/systemd/services/dr-provision.service`

2017-10-11 13:47
@rackneng thanks

shane
2017-10-11 13:48
presumably that's sorting out the issues for you ? :slightly_smiling_face:

2017-10-11 13:55
i'm turning off dhcp via rackn UI for now

shane
2017-10-11 13:58
presumably you're doing that through disabling the Subnet ...

ctrees
2017-10-11 14:30
So, I signed up for http://packet.net and got flagged (maybe because I used my google account as phone ?)... anyway... I was wondering how to get some test credit AND I've attempted to take Rob's demo (where he's using packet for the k8s)

ctrees
2017-10-11 14:32
I had done this in the past (standing up DR and just provision) a few months ago so went back to a CentOS7 vm on VBOX ran into some cert issues...

ctrees
2017-10-11 14:35
My 'long term' is to create an OpenAFS 'community' package that can be used at some of the local universities... were a deploy to packet OR to a 'pile of old stuff' is the same for the prof

shane
2017-10-11 14:35
@ctrees - yeah, I got flagged by packet when I signed up w/ a rackn account, too ... so ... they aren't just singling you out :slightly_smiling_face:

shane
2017-10-11 14:35
not sure what their heuristics are .... but a bit overly aggressive, is my guess

shane
2017-10-11 14:36
possibly if you've used packet previously, they're trying to reduce duplicate accounts ? I had been a previous packet user at another company

ctrees
2017-10-11 14:36
Oh I sort of figured that... I think is that they just want a human in the loop when they take a CC

ctrees
2017-10-11 15:28
OK.. so went through Shane new video (Thanks, very helpful 4me).

shane
2017-10-11 15:29
for reference - that's the new Community Content video, at: http://bit.ly/2z029lo

ctrees
2017-10-11 15:36
A few questions: 1. - Ansible vs 'Local Shell Scripts'. So the package is more about local scripts template expansion ? 2. - Duplicating 'packet ipxe setup locally' Seems like a good idea esp for a test lab is to duplicate how DR-P works within packet (pattern wise).. ?? agree ??

ctrees
2017-10-11 15:38
I know DR-P works well with ansible / kubespray via Rob's demo, just wondering when is using template expansion on shell scripts becomes a preferred method.

shane
2017-10-11 15:40
@ctrees we believe firmly that you should use whatever Cfg Mgmt/DevOps tooling you prefer - ansible, saltstack, chef, puppet ... etc. so we are not pushing/using either technology for basic provisioning

shane
2017-10-11 15:41
the Demo that @zehicle has is related to enabling applications placement on top of DRP - more as a demo of utilizing a DRP provisioned cluster

shane
2017-10-11 15:41
it just happens he chose Ansible Kubespray as the deployment mechanism

shane
2017-10-11 15:42
we also have Virtualbox plugin for interoperating a little better with VB for a local lab setup on your own laptop

shane
2017-10-11 15:42
though - to be honest, I haven't yet played with it much - I do use VirtualBox on my Mac - without the plugin, and it works fine - you just have to "work around" the VirtualBox peculiarities of trying desperately to own DHCP all the time

shane
2017-10-11 15:43
I'm going to put together a video on Virtualbox as a lab setup here shortly - as a few of us at rackn use it too

ctrees
2017-10-11 15:44
Oh... wow... that's just what I was doing (setting up a VirtualBox again)

ctrees
2017-10-11 15:45
I was running into https algo issues: [root@provision ~]# curl https://192.168.88.9:8092 curl: (35) Cannot communicate securely with peer: no common encryption algorithm(s). [root@provision ~]# curl http://192.168.88.9:8091 <pre> <a href="ALL-LICENSE">ALL-LICENSE</a>

ctrees
2017-10-11 15:47
yea... Vbox vnet has always been ?? iffy ?? combined with MACOSX can really make you want to force feed Debian to everyone :wink:

ctrees
2017-10-11 15:49
I do run mac myself basically because 'best popular' setup... I like to tell the mac people with issues "well OBVIOUSLY your HOLDING IT WRONG" when they have issues...

shane
2017-10-11 15:50
exactly

shane
2017-10-11 15:50
curl --insecure ...

ctrees
2017-10-11 15:50
OH... thanks... how do I get the webbrowser to poke through ?

shane
2017-10-11 15:51

shane
2017-10-11 15:51
example API usage to set Preferences

shane
2017-10-11 15:51
note the "--insecure" in the curl output

shane
2017-10-11 15:51
not sure what you mean by "poke through" ?

ctrees
2017-10-11 15:52
I was hitting the UI on the IP with a browser (like Rob did in his setup demo)

ctrees
2017-10-11 15:55
I couldnt' tell where it was getting blocked.... I 'suspect' maybe FW or SE... was just digging into it... but when Rob used a 'new machine' from packet... I figured.... well heck, I should just login to a new packet machine and figure out the starting env... then got to thinking... heck, this should be captured in a ks or ce-digital-rebar-baseline ??

ctrees
2017-10-11 15:56
need DR-P to setup a machine for DR-P :wink:

ctrees
2017-10-11 15:56
quick get your 'spinning inception top'

greg
2017-10-11 15:57
:slightly_smiling_face:

shane
2017-10-11 15:57
the web browser is a CORS application (cross origin resource sharing - https://en.wikipedia.org/wiki/Cross-origin_resource_sharing ) - so from your browser, you need outbound HTTPS to http://rackn.github.io (Portal), and then port 8092 access from your browser to your DRP Endpoint

shane
2017-10-11 15:58
actually ... I used the 5min-drp demo setup to spin up 10 http://Packet.net centos nodes - which I used to deploy DRP on one of them for the Community Content video .... :slightly_smiling_face:


shane
2017-10-11 16:00
I _believe_ the terraform-provider-packet plugin necessary is NOW available with the "always_pxe" option - and doesn't require additional compile to build it from the Beta release (i.e. it JUST released yesterday afternoon)

shane
2017-10-11 16:00
that demo requires the RackN registered (free) content be downloaded/available for the demo to work, since it uses some of the advanced Packet plugins and content to operate correctly

ctrees
2017-10-11 16:13
yea... I had stuff working months ago but the outbound to the portal is new, so that may be it... I was attempting to figure out just the servers were up (eliminate the UI thing)... which basically made me want to drop into an ansible script and do some register checks so I can 'knowledge transfer' abit easier..

ctrees
2017-10-11 16:20
which is why I like the idea of capturing the packet pattern in ansible as I should be able to produce vagrantfile that mucks it all into a 'laptop test-lab' setup (basically use the kubesray structure) I started to put a pfsense vm into the mix so I could use all vnet 'local only' and capture the SDN in ansible also...

ctrees
2017-10-11 16:52
needed one other switch for curl

ctrees
2017-10-11 16:52
[root@provision ~]# curl --ciphers ecdhe_ecdsa_aes_256_sha --insecure https://127.0.0.1:8092/ui <a href="https://rackn.github.io/provision-ux/#/e/127.0.0.1:8092">Moved Permanently</a>. [root@provision ~]# curl --ciphers ecdhe_ecdsa_aes_256_sha --insecure https://192.168.88.9:8092/ui <a href="https://rackn.github.io/provision-ux/#/e/192.168.88.9:8092">Moved Permanently</a>.

shane
2017-10-11 16:53
hmm - ok - your curl version must be .... odd ....

ctrees
2017-10-11 16:54
yea, which was why I was attempting to figure out what exactly is that 'base line' code 'ya-all' install on

shane
2017-10-11 16:54
"all y'alls"

ctrees
2017-10-11 16:54
but this is 'refreshing' my memory of the networking which I need

shane
2017-10-11 16:54
(though I'm in California ... so .... )

ctrees
2017-10-11 16:55
mt.view or ?

shane
2017-10-11 16:55
yep

shane
2017-10-11 16:55
hence meetup address

ctrees
2017-10-11 16:56
pretty shinny ceiling... so I doubt your in any old buildings I worked in...

shane
2017-10-11 17:09
(shhh ... I work from home ... )

ctrees
2017-10-11 17:10
oh is that a bad word now in silly valley (post yahoo)

ctrees
2017-10-11 17:12
so... if you were to guess... my web problems probably I don't have updated crypto on my vm.... so... 'yum update'... woops... apt-get update ? or ya think I have a more fundamental config issues ?

ctrees
2017-10-11 17:13
... I had this working but MONTHS ago...

ctrees
2017-10-11 17:15
I'm good with blowing everything away... but if that's the case, I'd like to know what to pull in as a baseline... I did attempt to follow: http://packet.rebar.digital/default.ipxe to figure out that ks setup...

shane
2017-10-11 17:16
are you looking to do all of this in packet ?

ctrees
2017-10-11 17:16
or could wait for packet to get back to me... or wait till you do vagrant ... either way

shane
2017-10-11 17:17
what is the user/account you registered under? I know the packet guys, and I can get them to fix that for you asap

ctrees
2017-10-11 17:17
ultimately it'll be setup as a CI for normal dev-ops... so laptop -> test -> pre-pro -> pro

shane
2017-10-11 17:17
I'd suggest restarting w/ the 5min-drp demo stuff in Packet - from scratch

ctrees
2017-10-11 17:17
but for sure I want to give them the option of 'well... just go test with packet'

ctrees
2017-10-11 17:18
ok... that's what I'll do...

shane
2017-10-11 17:18
the virtualbox stuff is a bit fragile ... mostly because of virtualbox itself

ctrees
2017-10-11 17:18
just a sec, let me check email... maybe they've gotten to me... it's

shane
2017-10-11 17:18
we need more time to polish those pieces up

ctrees
2017-10-11 17:20
I agree with vbox... and I've got lots of 'work-arounds' as the timer slips cause havoc with OpenAFS (kerbos) you shouldn't have as much issues with that... but for sure the vnet crap is a pita...

shane
2017-10-11 17:21
I'm also working on some Vagrant based solutions - but again, I'm using VirtualBox as provider for vagrant .... so ....

ctrees
2017-10-11 17:21
but I for sure would use kubespray as a model... those guys have 'lots of eyes and hands'...

wdennis
2017-10-11 17:24
@shane What is the generated URL for a kickstart? (as opposed to a preseed) `{{.Machine.Url}}`/???

wdennis
2017-10-11 17:41
^^^ or @greg - like to be able to take a look at it while my install is happening here

shane
2017-10-11 17:49
@wdennis not sure off hand - checking

shane
2017-10-11 18:10
@wdennis I'm guessing it's something like: curl --insecure http://127.0.0.1:8091/machines/<MACHINE_UUID>/ks.cfg

shane
2017-10-11 18:11
but only because "ks.cfg" is the defacto standard for kickstart filename - not confirmed yet

shane
2017-10-11 18:11
also - once a job has been run fully, the file appears to be removed from the tftpboot tree (which is where this is served from)

shane
2017-10-11 18:12
so - you can't grab a successfully finished/installed version to review - it would only be during machine build (or potentially failed to build) stage you can grab it

lae
2017-10-11 18:22

lae
2017-10-11 18:31
@wdennis `drpcli bootenvs show $BOOTENVNAME | jq '.Templates'` should show you the templates for a bootenv, and you can probably derive the correct Path from it. `ce-centos-7.3.1611-install` has `Path: "{{.Machine.Path}}/compute.ks"` (https://github.com/digitalrebar/provision-content/blob/master/bootenvs/ce-centos-7.3.1611.yml#L27) which should turn into something like `http://drp.local:8091/machines/<MACHINE_UUID>/compute.ks`

wdennis
2017-10-11 18:48
Yes, I knew you could only see it while the bootenv is set to the -install and before it gets changed back to ?local?

greg
2017-10-11 19:01
@lae is correct

wdennis
2017-10-11 19:02
@lae @greg @shane Thanks

wdennis
2017-10-11 19:08
@greg No `post-install.sh` used (needed) on the RHEL/CentOS install (b/c have `%post` in kickstart, right?)

greg
2017-10-11 19:08
centos has all in kickstart - ubuntu uses both preseed and post-install.sh

wdennis
2017-10-11 19:09
OK, what I figured

wdennis
2017-10-11 19:46
Does os-discovery Package `v1.0.0-tip-30` require a new sledgehammer ISO to be downloaded?

wdennis
2017-10-11 19:47
All my *discovery and *sledgehammer Bootenvs are ?X? now having updated to that?

shane
2017-10-11 19:48
your current sledgehammer/discovery image you want can be listed as: ```[root@5min-drp-ewr1-00 ~]# ./drpcli contents show os-discovery | jq '.sections.bootenvs.discovery.OS.IsoFile' "sledgehammer-b689ed6b5e0dd74677acc3ffe9b8cafc5b7c8357.tar"```

shane
2017-10-11 19:49
kind of a long JSON parameter to get - you could just `| grep IsoFile`

shane
2017-10-11 19:49
compare the sledgehammer SHA sum to the one you have

wdennis
2017-10-11 19:52
Another q - is the ?assets? subdir in isolated DRP still a thing? Or is everything that?s used living in ?drp-data? now?

shane
2017-10-11 19:53
everything is in drp-data/ in isolated mode

shane
2017-10-11 19:53
production pushes to /var/lib/dr-provision as a base with "digitalrebar" and "tftpboot" being subdirs in both cases

wdennis
2017-10-11 19:54
So the ISOs in use live in `drp-data/tftpboot/isos/` then?

shane
2017-10-11 19:54
yes - that's where the drpcli command will push them to

shane
2017-10-11 19:55
and where the 'explode' function finds them to explode them out in to your tftpboot/ directory appropriately

shane
2017-10-11 19:55
you can technically just copy the ISO to that directory - and just 'kill -HUP' the dr-provision service

wdennis
2017-10-11 19:55
OK

shane
2017-10-11 19:55
it'll see the new ISO and explode it out

wdennis
2017-10-11 19:56
Then this is my discovery problem: ```[dradmin@dr-admin drp]$ tree drp-data/tftpboot/isos/ drp-data/tftpboot/isos/ ??? CentOS-7-x86_64-Minimal-1611.iso ??? sledgehammer-80d6b866edba30a81fce1783b9f745ce9a003e13.tar ??? sledgehammer-b689ed6b5e0dd74677acc3ffe9b8cafc5b7c8357.tar ??? ubuntu-16.04.2-server-amd64.iso ??? ubuntu-16.04.3-server-amd64.iso 0 directories, 5 files [dradmin@dr-admin drp]$ ./drpcli contents show os-discovery | jq '.sections.bootenvs.discovery.OS.IsoFile' "sledgehammer-f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15.tar"```

shane
2017-10-11 19:57
if your DRP endpoint has internet access you can just run 'drpcli bootenvs uploadiso FOO' (iso name) and it'll download based on the bootenv specified HTTP location; stage it in to the isos directory, and tickle the explode function to do it's thing

wdennis
2017-10-11 19:57
Yup, doing that now

shane
2017-10-11 19:57
we definitely need to do a little more around (at least) notifying on content update that a new ISO image version is required

wdennis
2017-10-11 19:58
Request: when Content Packs are updatable, warn end user if dependencies will need to be updated as well (like the ISOs)

wdennis
2017-10-11 19:58
Ah, beat me to it :slightly_smiling_face:

2017-10-11 20:02
hello all - i need a little bit of clarification.. when running the curl command that should be executed as root or as a sudo user?

shane
2017-10-11 20:03
hi @iamjes - Shane here w/ RackN, pleased to meet you

shane
2017-10-11 20:04
the curl command does not need to be run as root

shane
2017-10-11 20:04
the DRP endpoint needs access to bind to port 67 and 69 for DHCP and TFTP if you are not using your own services

shane
2017-10-11 20:05
you can either run DRP as root, or you can use the setcap capabilities to allow dr-provision binary to bind to those low ports as a non-root user - keeping it completely contained to an unprivileged user account

2017-10-11 20:05
I have a machine with two nic's and the isolated install is the best way to run and then configure the dhcp and pxe after network interfaces have been updated with the internal ip address?

shane
2017-10-11 20:06
it's probably easier to run in isolated mode - there is very little difference; other than where the local installed content gets put

shane
2017-10-11 20:07
if you do - all content will be located in `~/drp-data` of the user account you install as

shane
2017-10-11 20:09
and, yes - you can add "Subnets" to your DRP endpoint after install - once you enable the subnet, that will turn on the DHCP services for that Subnet

shane
2017-10-11 20:10
so you can exclude (by NOT configuring) a subnet - and no DHCP services will run there

wdennis
2017-10-11 20:10
@IAMJES If it helps plan, I am running in isolated mode, here?s the disk usage & directory structure: ```[dradmin@dr-admin drp]$ du -h -d 2 . 99M ./bin/linux 98M ./bin/darwin 99M ./bin/windows 295M ./bin 20K ./tools 5.6G ./drp-data/tftpboot 212K ./drp-data/digitalrebar 100K ./drp-data/saas-content 0 ./drp-data/plugins 0 ./drp-data/job-logs 5.6G ./drp-data 7.7G .```

2017-10-11 20:11
ok thanks

wdennis
2017-10-11 20:12
That?s with CentOS 7, Ubuntu 16.04 and discovery ISO images

shane
2017-10-11 20:12
@iamjes - you can also specify that you require "Reservations" for DHCP leases; and you can be very prescriptive about which systems in a given subnet will receive a DHCP lease, and subsequently provision against the DRP endpoint

shane
2017-10-11 20:12
you have the option of specify Reservations are required or optional

shane
2017-10-11 20:13
if optional; then any system that PXE boots and requests DHCP will be answered; if required, only systems with a Reservation will be responded to

2017-10-11 20:15
thats nice - this is for a small network and i guess once they are provisioned in 'sledgehammer' then id like them to move into a permanent ip address allocation

shane
2017-10-11 20:19
you can do that by specifying via a Reservation; in which case, the Reservation assigned IP address would become the final IP address of the host

shane
2017-10-11 20:20
the alternative method is to add a post-provisioning `task` that re-IPs the host after initial provisioning activity is complete

greg
2017-10-11 20:20
or have that task convert the lease to a reservation (maybe that is what you meant as well).

2017-10-11 20:21
or that :)

greg
2017-10-11 20:21
both are doable.

2017-10-11 20:22
how long does it take for the API server to start?

shane
2017-10-11 20:23
if you have no content in the system - just a few seconds

wdennis
2017-10-11 20:23
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F7GU3ADNE/pxe_install_os_options.pdf and commented: @shane @greg Another longer-term question ? take a look at the attached file; my question is what options does DRP support today, which may it support in future, which are unsupported (possibly b/c underlying install answer file format does not have the capability?)

2017-10-11 20:26
and i can install apache2 and put in my ssl before running the install

2017-10-11 20:43
looks like it accepted just fine

2017-10-11 20:54
while i am waiting for the last of it to come which page should i use first?

greg
2017-10-11 21:02
@IAMJES - not sure which page you mean? Which UI page to start with?

greg
2017-10-11 21:03
https://<IP>:8092/

2017-10-11 21:03
thanks

greg
2017-10-11 21:03
accept the self-signed cert (or your own) and then look for info/proferences

2017-10-11 21:07
i have my own cert so if you go to https://rebar.010101.info it comes up with the apache test page and good lock - now it isnt and im redirected to rackn

greg
2017-10-11 21:07
cool

greg
2017-10-11 21:08
hmm

greg
2017-10-11 21:08
need to think about that for a moment.

2017-10-11 21:08
https://rackn.github.io/provision-ux/#/e/rebar.010101.info:8092/system

greg
2017-10-11 21:08
oh cool - does that work?

greg
2017-10-11 21:08
I guess it does.

greg
2017-10-11 21:09
Probably will.

greg
2017-10-11 21:09
yeah should. Nice.

greg
2017-10-11 21:09
You can also alter the port if you want.

2017-10-11 21:14
i dont care about the port - just wondering why i was redirected

greg
2017-10-11 21:15
The UI is served from cloud resources.

greg
2017-10-11 21:15
It then attaches to the API endpoint you started with.

greg
2017-10-11 21:16
That lets the UI update faster and more responsively than DRP

2017-10-11 21:16
will that redirect ever go away?

greg
2017-10-11 21:16
The intent is no. DRP may grow the ability to serve the UI locally, but it isn?t currently packaged that way.

greg
2017-10-11 21:17
What is the concern?

2017-10-11 21:18
how can i make this all secure?

shane
2017-10-11 21:20
@iamjes - the DRP Endpoint never reaches out to the RackN Portal, the only way it is "managed" is via your web browser, which sits between the DRP Endpoint and the RackN Portal

shane
2017-10-11 21:21
your DRP Endpoint can be 100% isolated from the outside - and it'll operate fully

2017-10-11 21:21
ok

shane
2017-10-11 21:23
your web browser acts as an intermediary - techinically via CORS connections (cross origin resource sharing - https://en.wikipedia.org/wiki/Cross-origin_resource_sharing ) to connect **from** your browser to your DRP Endpoint - and **from** your browser to the RackN Portal

shane
2017-10-11 21:24
if you do not use/do that - then there is no UI to manipulate/manage the DRP Endpoint with - but it's still 100% operable via the Command Line (drpcli) or API access within your local environment

shane
2017-10-11 21:25
you can further secure your DRP Endpoint so you do not need to run it as Root; via the `setcap` capabilities

shane
2017-10-11 21:27
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F7GSXMDT5/using__setcap__to_run_as_non-privileged_user.yaml and commented: Beginnings of my write up on how to secure the DRP Endpoint (dr-provision daemon).

shane
2017-10-11 21:28
All API (and subsequently CLI) requests are authenticated via either a username/password pair, or you can create Tokens with limited scope to reduce the permissions and time that a given API call can execute

shane
2017-10-11 21:28
it is advisable to change the default username/password pair for the DRP Endpoint

shane
2017-10-11 21:29
and if you are concerned about other people accessing the API endpoint, further secure the API Ports (67, 69, 8091 and 8092) to JUST the hosts being provisioned and your admin access points; via the use of a local Firewall and policies on the DRP Endpoint - or intermediate firewall between the DRP Endpoint and your other networks

2017-10-11 21:35
thats fine, as long as i understand why I am all good

lae
2017-10-11 21:38
now, if only I had time to cleanup my ansible role for deploying DRP in that manner

lae
2017-10-11 21:39
(and publishing it lol)

shane
2017-10-11 21:39
it doesn't count if it isn't published ... !

lae
2017-10-11 21:39
!

greg
2017-10-11 21:39
theoretically at best :slightly_smiling_face:

2017-10-11 21:41
as long as it gets in there thats ok

2017-10-11 21:47
when signing up for an account on rackn beta it is looking for my name and not my domain name in the family name?

2017-10-11 21:49
nevermind i got it

2017-10-11 22:23
last question for the day - when i reboot the server and i want it to execute silently what is the command i should run?

shane
2017-10-11 22:24
the DRP endpoint ?

2017-10-11 22:24
the install on my side

shane
2017-10-11 22:25
`./dr-provision --base-root=$HOME/drp-data --local-content= --default-content= > drp-local.log 2>&1 &`

2017-10-11 22:26
ok and i can put that into a sh file - thank you

shane
2017-10-11 22:26
that will log to file "drp-local.log", and send both stdout and stderr to that file, and put dr-provision in the background

2017-10-11 22:26
fantastic

shane
2017-10-11 22:26
you may need the `--static-ip=<IP_ADDRESS>` of your DRP Endpoint

shane
2017-10-11 22:27
option

2017-10-11 22:27
ok

2017-10-11 22:41
i had a small power surge here and when i go back to run i do this and get ... rebar@rebar:~$ sudo ./dr-provision --static-ip=192.168.1.188 --base-root=/home/rebar/drp-data --local-content="" --default-content="" & [9] 1488

2017-10-11 22:41
i dont care if it has to all start over

greg
2017-10-11 22:43
If you do sudo, don?t add the &. It is probably asking you for password. Once started background it

2017-10-11 22:43
i thought that was the job of the &

shane
2017-10-11 22:44
do this instead: ```sudo date sudo ./dr-provision --base-root=$HOME/drp-data --local-content= --default-content= > drp-local.log 2>&1 &```

greg
2017-10-11 22:45
Yeah but sudo might ask for a password and the backgroundinv by & will cause the sudo to stop all execution

shane
2017-10-11 22:45
sudo is asking for your password - if you issue "sudo date" first, it'll ask for your password, then authenticate you

greg
2017-10-11 22:45
Yeah that

shane
2017-10-11 22:45
you can temporarily run other sudo commands without being challenged for a password

shane
2017-10-11 22:46
the problem is putting dr-provision in the background, doesn't allow you to authenticate your sudo request - your TTY is disconnected from stdin

shane
2017-10-11 22:46
you could also change sudo to allow you to run "dr-provision" binary without password challenge

2017-10-11 22:48
where do i go to remove everything - now i get this...

2017-10-11 22:48
rebar@rebar:~$ sudo date ./dr-provision --base-root=$HOME/drp-data --local-content= --static-ip=192.168.1.188 --default-content= > drp-local.log 2>&1 & [17] 1634 [16] Exit 1

shane
2017-10-11 22:49
`sudo date` should be run by itself first .... *then* run the `sudo ./dr-provision...` after you've authenticated with `sudo date`

2017-10-11 22:50
[13] 1643 is all i get

2017-10-11 22:51
where do i remove everything to start over - or should i just re-image the server

2017-10-11 22:58
i got it

shane
2017-10-11 23:11
:slightly_smiling_face:

wdennis
2017-10-12 03:19
O noes! A ?barclamp? reference in the `burnin` Content pack?

wdennis
2017-10-12 03:19
`Automatically generated from the burnin barclamp`

wdennis
2017-10-12 03:19
shudders

2017-10-12 12:31
Looks like i am missing out on the 'ux' - how do i get those tools

wdennis
2017-10-12 12:34
@IAMJES The ?UX? (techno term for ?user experience?) is the RackN portal website (the one you redirect to when you hit `https://<drp_server_ip>:8092`)

wdennis
2017-10-12 12:35
Also, if you are using Gitter, you should request a Slack invite to http://rackn.slack.com

2017-10-12 12:35
my page looks nothing like what i am seeing in the videos

2017-10-12 12:36
I am in Gitter

2017-10-12 12:37
:worried: i guess i need that

wdennis
2017-10-12 12:41
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F7GJZ0AUQ/drp_ux.png and commented: DRP UX screen

wdennis
2017-10-12 12:41
You aren?t getting this?

2017-10-12 12:44
thats the one i get but thats not the one you see in the youtube videos

wdennis
2017-10-12 12:44
YouTube link?

wdennis
2017-10-12 12:44
Is the webpage color green?

2017-10-12 12:45
https://www.youtube.com/watch?v=6xuVm9PJ2ck

wdennis
2017-10-12 12:46
Ah yes, that?s the prior version (v2) of what was then called just ?Digital Rebar?

2017-10-12 12:47
those wizards were slick! where did they go?

wdennis
2017-10-12 12:47
Now they?ve moved to v3 (current ver 3.1.0) of what?s now called ?Digital Rebar Provision?

2017-10-12 12:48
deploying openstack on k8 is the goal

wdennis
2017-10-12 12:48
They refactored the ?product? into DRP ? greatly simplified it

wdennis
2017-10-12 12:51
You should speak with @zehicle about how to get O?stack on K8s ? DRP can provide the K8s (via Ansible ?Kubespray? integration with DRP) but how to then get O?stack running on K8s, I?m not too sure of?

2017-10-12 12:51
in this video he made it look to easy

wdennis
2017-10-12 12:52
Yes, YMMV though ? I found DR v2 to be a twisty maze, all passages looking alike :slightly_smiling_face:

wdennis
2017-10-12 12:52
(my opinion/experience only)

2017-10-12 12:56
I will take your word for it

zehicle
2017-10-12 12:58
@IAMJES that integration broke when they moved into big tent and started to vertically integrate the install. Happy to talk 1x1 on what it would take. I'm still excited about the approach, but...

zehicle
2017-10-12 12:59
It's going to take some investment

2017-10-12 13:03
@zehicle I am still working through some of the other options to get them working the way i need them. I really liked the wizards you all were showing in the videos. My long term goal is to either deploy openstack using your tool or making it so people can login and launch environments as needed

2017-10-12 13:29
I am getting a little turned around in the profiles, templates, etc... I am trying to make it so a machine goes through sledgehammer, and then is deployed ubuntu 16.04 with 32gb swap

2017-10-12 13:29
I am thinking it is a operator error

shane
2017-10-12 13:30
have you successfully deployed Ubuntu with the stock templates first ?

2017-10-12 13:31
thats also part of my confusion.. i created a new template i thought from ubuntu drp... and made START. in start the contents are the IP addresses of the machines that have been through sledgehammer

shane
2017-10-12 13:36
I'd suggest you do things in very small steps - first deploy the stock Ubuntu to a single node.

shane
2017-10-12 13:37
Then make a small change to inject new Root PW and SSH Keys - get comfortable with that

shane
2017-10-12 13:38
then clone a template (say ce-net-post-install.sh.tmpl) make a few tweaks, and see how it operates

shane
2017-10-12 13:38
from there you can clone ce-net-seed.tmpl - and start making hacks to reflect your requirements

2017-10-12 13:38
i cloned the ubuntu drp to START

shane
2017-10-12 13:39
do you mean you cloned the "boot environment" ?

2017-10-12 13:40
no - templates right below boot environments

shane
2017-10-12 13:40
which was the source template you cloned ?

2017-10-12 13:41
ubuntu-drp-only-repos.tmpl

2017-10-12 13:42
my portal is acting odd and i little locks...

2017-10-12 13:42
i have*

2017-10-12 13:47
ill be back and go through more of the tutorials

shane
2017-10-12 13:49
@iamjes - the UI is definitely still a Beta version - it's only a couple of weeks old at this point - there are a few places when the UI will "hang" (like after cloning a template) - you need to reload the page, and that fixes it ...

2017-10-12 14:07
how do i put a png in here

shane
2017-10-12 14:08
what do you mean by a png?

2017-10-12 14:09
This is the error i am getting ... I have cloned the template and nothing not even the defaults appear - png is a file type for pictures https://imgur.com/a/UIKf8

shane
2017-10-12 14:09
ah - you mean a png in the conversation ? :slightly_smiling_face:

shane
2017-10-12 14:10
do you have Slack ? It's easier to add snippets of text, images, etc via the Slack app - not sure how to do it through the gateway you're coming in from

2017-10-12 14:10
im just using the web page - anyway the link i put in there is the error i am getting

2017-10-12 14:11
or lack thereof

shane
2017-10-12 14:11
can you point me to the web page you're using ?

shane
2017-10-12 14:11
if you choose to use slack, you can request a Slack invite from: http://www.rackn.com/support/slack/

shane
2017-10-12 14:13
is your goal to simple use a local repo for your installs ?

2017-10-12 14:22
@rackneng yes it is i sent a PM to rackneng

2017-10-12 14:27
@rackneng here is the url https://rackn.github.io/provision-ux/#/e/rebar.010101.info:8092/stages/

shane
2017-10-12 14:35
@iamjes - if you are just trying to change the Repo mirror you use, it's simple to add a few Params with the right names, to change the mirror. You don't need to clone/modify the Template file

shane
2017-10-12 14:36
would again ask - what is your goal you are trying to achieve - and lets help you get there, best if we take it slow steps as you get used to the system

2017-10-12 14:41
OK - step one provision a machine with ubuntu

shane
2017-10-12 14:41
:slightly_smiling_face:

shane
2017-10-12 14:41
we recommend doing things iteratively - first make sure a stock ubuntu deploy works happily ...

shane
2017-10-12 14:42
then lets start sorting out how to hack it make it your own

2017-10-12 14:46
Ok

greg
2017-10-12 15:15
For those building DRP directly, you will now need go 1.9 for tip builds. We are using some new testing features so that we can get better stack traces on test failures.

vlowther
2017-10-12 15:16
Specifically, some of the test helpers have been marked with the new t.Helper() method

2017-10-12 15:29
hi there! can you discuss few minutes around built-in iso\images on next meetup? i'm interesting about the level of support, is this a community feature or you have some plan on it? for example - my opinion: sledgehammer build script has too many hardcore URL with blobs, and the only proper way to use it - download already built tar archive, or build own discovery image\reinvent the wheel.

shane
2017-10-12 15:33
@kolomnitcki - we'd be happy to have you post a comment in the Agenda document requesting this - we are planning to have a community feedback discussion as part of next meetup - you're input would be appreciated: https://docs.google.com/document/d/1DGuqkjM-oZQ37GLcpwkSIzyTKpPZknTgqjQBS5uusoY/edit#heading=h.ifc9tve9wbk4


shane
2017-10-12 15:34
if you'd add a Comment to the "Community Feedback" section ...

2017-10-12 15:37
ok

2017-10-12 15:38
@rackneng - Greg - 1.9 ?

2017-10-12 15:40
The other question i have is how do i make it so docker is part of the initial build

greg
2017-10-12 15:41
golang 1.9

greg
2017-10-12 15:41
sorry - most people aren?t building DRP directly.

greg
2017-10-12 15:41
just a warning. And really it is only required for running the tests, but run the tests please if you build it.

2017-10-12 15:42
also in this url http://provision.readthedocs.io/en/stable/doc/install.html - talks about the user experience 'ux' still being available - is it not gone?

greg
2017-10-12 15:43
it is now /ui or /. The redirect is the new way.

2017-10-12 15:44
which drp has the wizards that i saw in the youtube videos.. once i get to where i am going how hard is it to upgrade drp

2017-10-12 15:45
sorry if i didnt say that right - still learning the language here

2017-10-12 15:48
I knew i worked on this before... back in the day...

2017-10-12 15:48
cd ~ mkdir digitalrebar git clone https://github.com/rackn/digitalrebar-deploy digitalrebar/deploy ln -s digitalrebar/ digitalrebar/deploy/compose/digitalrebar cd digitalrebar/deploy ./run-in-system.sh --deploy-admin=local --wl-docker --access=HOST --con-provisioner --con-dhcp --admin-ip=192.168.99.1/24

ctrees
2017-10-12 18:09
are these moved ?



ctrees
2017-10-12 18:12
aka rackn >sb> digitalrebar

ctrees
2017-10-12 18:13
* download RackN "drp-rack-plugin", which is available at:

shane
2017-10-12 18:22
hey chris - I'm updating that code to work correctly - but in generaly the "VER_PLUGINS", "DRP_OS", and "DRP_ARCH" variables needs to be set correctly

ctrees
2017-10-12 18:23

ctrees
2017-10-12 18:24
so I just figured it's moving ?

ctrees
2017-10-12 18:24
... unless your doing github 'magic'

ctrees
2017-10-12 18:25
Or that's private repo stuff...

shane
2017-10-12 18:29
private repo stuff - (most) all of the public content has moved to the digitalrebar repo

ctrees
2017-10-12 18:36
so... how would I actually get that OR is the script going to do that ?

shane
2017-10-12 18:38
I'm working on those details right now for you - and that'll be updated either in the README or the script - depending on how automated I can make it - it requires authentication since it's registered (free) use content from RackN that makes it work in the http://packet.net environment

ctrees
2017-10-12 18:39
OK... I'm about done for day anyway, should I just check the git readme tomorrow ?

ctrees
2017-10-12 18:40
I did register myself for beta rackn... got the email back

ctrees
2017-10-12 18:41
OH... I just saw if [[ "$USER" == "shane" ]]

ctrees
2017-10-12 18:41
:wink:

shane
2017-10-12 18:41
excellent - yes, I'll update the git repo README today when I sort it out - and I'll drop you a DM here to let you know

ctrees
2017-10-12 18:42
ok... thanks

shane
2017-10-12 18:42
yeah ... make it easy-peasy for me to grab my pre-staged secrets/content :slightly_smiling_face:

2017-10-12 18:43
how do i pay for the rackn add-ons?

shane
2017-10-12 18:44
um

ctrees
2017-10-12 18:44
I could go do it manually... but you had the terraform going AND the guys here think they want to go terraform too... so figure I'd sort of force-feed your pattern... if Doc Gray accepts then at least 3 user groups and companies will follow...

2017-10-12 19:03
just a fyi - i am not able to add providers

shane
2017-10-12 19:09
@iamjes - are you getting an error? how are you trying to add a Provider ?

2017-10-12 19:10
@rackneng - no errors nothing happens

shane
2017-10-12 19:15
it looks like the 3 plugin providers are installed correctly - "slack", "ipmi", and "packet-ipmi"

shane
2017-10-12 19:15
you don't need "packet-ipmi" if you aren't provisioning in the http://packet.net environment

shane
2017-10-12 19:16
once you add the "Plugin Provider", you have to create System --> "Plugin" of the type you want to use

shane
2017-10-12 19:16
for example - Add a plugin of type "IPMI" to enable IPMI (power on/off/reset) actions

2017-10-12 19:17
when i go to systems -> plugins and click on IPMI nothing happens

shane
2017-10-12 19:18
Did you do "Add", then "Use Provider" ??

2017-10-12 19:19
yes - when i do that i get a box in the middle of the screen to click on 'Add'

2017-10-12 20:11
@rackneng - nice youtube video on how to use packet for using community content - is there any that shows how to deploy ubuntu machines on premise?

2017-10-12 20:47
@rackneng - maybe its just me the UI has some catching up to do of how things are done at CLI

shane
2017-10-12 20:57
@iamjes ... UI is BETA - it was only released about 2 weeks ago tops - so, yes, it has some catching up to do - but it's leaps and bounds better than the green UI in the 3.0.x versions !! :wink:

2017-10-12 20:59
@rackneng - Shane ill take your word for it once i see all the lines of code in json i get lost so fast trying to keep with the video

shane
2017-10-12 21:00
JSON makes any sane persons eyes go crossed ....

shane
2017-10-12 21:00
anyone that likes JSON has something slightly wrong with them ... but goodness help me ... it's a MILLION times better than XML !!!

2017-10-12 21:01
all i want for is to learn how to 'hammer' the ubuntu deployment

shane
2017-10-12 21:02
you can do that without any of the advanced plugins - with purely the Community Content - the "ce-discovery", "ce-sledgehammer", and the "ce-ubuntu-16.04-install" content - but you won't get the IPMI reboot capabilities and the advanced Workflows of the Stages and Tasks stuff - which is pretty cool stuff

shane
2017-10-12 21:03
but the UI isn't 100% nailed yet, and so some of the things may need to be finished at CLI ... I'm happy to help you with those things

2017-10-12 21:05
i have all the other groundwork done i think

2017-10-12 21:05
rebooting server

2017-10-12 21:08
done and online

2017-10-12 21:09
there was another template video in march and the paths were different and i couldnt find my way around - hence my emphasis on the UI

shane
2017-10-12 21:10
There are 3 main versions of the product that you are going to find videos for .... the older Digital Rebar version 2 solution which was based on containerized system, Digital Rebar Provision (DRP) ver 3.0.x, and the current DRP 3.1 version

shane
2017-10-12 21:11
any videos for DRver2 should be ignored ... half of the content in the DRP 3.0.x videos is ... probably ... not right ...

2017-10-12 21:11
:worried:

shane
2017-10-12 21:11
you want to focus on any videos for DRP ver 3.1

shane
2017-10-12 21:12
the product went through a very large metamorphosis between DRver2 and DRP ver 3 ...

shane
2017-10-12 21:12
from 3.0.x to 3.1 there was a huge amount of new features and capabilities added - and that was released only a month ago

shane
2017-10-12 21:13
in addition, the UI was completely reworked/rebuilt/rewritten from scratch along with that

2017-10-12 21:13
is version 3 the wizards?

shane
2017-10-12 21:14
I think the "wizards" you are referring to is the older DRver2 product

2017-10-12 21:14
once i get this figured out i dont mind working on some training word documents for dummies

shane
2017-10-12 21:14
none of those work/relate/pertain to the DRPv3 line

shane
2017-10-12 21:14
DRP is a community open source project, and we'd definitely welcome any additions/help with the documentation as it is... we know it's pretty rough

2017-10-12 21:16
once i am able to understand this again and be able to do it over again i dont mind documenting my steps screen for screen and any commands done in the terminal

2017-10-12 21:16
then you all can wordsmith and then publish however you like

shane
2017-10-12 21:17
sounds good !

2017-10-12 21:18
so here is where i am now .. https://rackn.github.io/provision-ux/#/e/rebar.010101.info:8092/system

2017-10-12 21:18
i have grabbed all the boot environments i think i need and uploaded them to the server, the preferences are correct i think

shane
2017-10-12 21:20
you're not operating in the http://packet.net environment - I don't think ... right ?

2017-10-12 21:20
so reading the panes i am led to believe that we work our way down from templates -> params -> profiles? or am i working backwards

shane
2017-10-12 21:20
so no need for anything labeled with "packet"

2017-10-12 21:20
No i have all my servers here

shane
2017-10-12 21:20
future note; you can delete all content/plugins related to packet and you'll be fine

shane
2017-10-12 21:21
those are just helper pieces that do reboots and help inject SSH keys through packet's metadata services to add to a provisioned server

shane
2017-10-12 21:21
the ordering of setting things up doesn't matter - just that "all the things get set up"

shane
2017-10-12 21:21
before you try to provision

2017-10-12 21:23
ok i found the packet item and removed it

shane
2017-10-12 21:23
also - if you are using the RackN content, you do not need the "drp-community-content" Contents

shane
2017-10-12 21:23
and - the associated BootEnvs of ce-* (eg ce-discovery, ce-sledgehammer, ce-ubuntu...)

2017-10-12 21:23
this was what i used curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/stable/tools/install.sh | bash -s -- --isolated install

shane
2017-10-12 21:23
it'll just confuse things if you try and mix-n-match them inappropriately

shane
2017-10-12 21:24
when you do the install - if you know you are going to use RackN content - and not drp-community-content, you can add a `--nocontent` flag to the install

shane
2017-10-12 21:24
and it won't be installed

shane
2017-10-12 21:24
should you decide to add it back in - you can just add the "drp-community-content" Contents back, along w/ the associated BootEnvs

2017-10-12 21:25
you are talking about the content packages? the default items never downloaded so i did it myself for the files

shane
2017-10-12 21:25
right - there are 2 types of "Content"

shane
2017-10-12 21:26
"community content" == drp-community-content in the Contents screen, and associated BootEnvs with ce-* names

shane
2017-10-12 21:26
that stuff is all 100% open/free to community use and input/modifications/etc

shane
2017-10-12 21:27
RackN also distributes more advanced content ... which we just refer to as "RackN Content" - you must register for the use of that content, and a lot of it's free for use, as long as you register

2017-10-12 21:27
i am registered

shane
2017-10-12 21:27
there is some content you can purchase, which is even more advanced and customized pieces that are pay for content

shane
2017-10-12 21:27
:slightly_smiling_face: yep

shane
2017-10-12 21:27
with the use of the RackN registered/free content - you do NOT need the Community Content

shane
2017-10-12 21:27
and in your case, since you are not operating in http://packet.net - you do not need any "packet" stuff

2017-10-12 21:28
i think i removed all the packet stuff

shane
2017-10-12 21:28
now - "Content" is a number of things

shane
2017-10-12 21:28
"Contents" are just bundles of similarly grouped things

shane
2017-10-12 21:28
"Plugin Providers" are add-on features - which need to be configured as a "Plugin" (you saw that earlier)

shane
2017-10-12 21:29
"BootEnvs" are individual install pieces (eg Centos or Ubuntu) operating systems

2017-10-12 21:29
i have both ubuntu iso's there

shane
2017-10-12 21:29
yep

2017-10-12 21:30
so it is easier to make a new profile

2017-10-12 21:31
call it plain-ubuntu-16

2017-10-12 21:33
*^%$! - profile isnt responding through the UI

shane
2017-10-12 21:34
I've noticed that your Endpoint seems to pause occasionally

2017-10-12 21:34
i am on 150/150 fiber - so i dont know

shane
2017-10-12 21:34
is it a VM on a busy hypervisor or something? it feels like either I/O blocking on hypervisor (eg "noisy neighbor"), or a network connection issue

shane
2017-10-12 21:35
the DRP Endpoint - is it running as a Virtual Machine ?

2017-10-12 21:35
it is not on a vm

shane
2017-10-12 21:35
the DRP 3.1 binary is pretty stable, and doesn't block - unless you are "exploding" ISO content - it will block API calls temporarily as it writes to disk to protect any concurrent access to the written content

2017-10-12 21:36
i dont know

shane
2017-10-12 21:37
our binary is pretty light weight (29 mbyte in size on disk) ... and only requires a few 100 mbyte memory in general

2017-10-12 21:37
there's more than enough for it to run

2017-10-12 21:38
provisioning a machine on pxe is done in about 30 seconds

2017-10-12 21:38
(the start process anyways)

2017-10-12 21:43
so whats the next step

shane
2017-10-12 21:44
we can either drive provisioning manually - or create a stagemap to automate the stepping through of each stage

2017-10-12 21:45
automate!

shane
2017-10-12 21:46
ok - easiest thing to do is create the stagemap with our good old friend JSON

shane
2017-10-12 21:46
```{ "Available": true, "Description": "Global Ubuntu Stage Map", "Name": "global", "Params": { "change-stage/map": { "discover": "ubuntu-16.04-install:Reboot", "ssh-access": "complete-nowait:Success", "ubuntu-16.04-install": "ssh-access:Success" } } }```

shane
2017-10-12 21:47
assuming you want to Provision Ubuntu 16.04

2017-10-12 21:47
yes

shane
2017-10-12 21:47
write that to something like "stagemap-ubuntu.json" on your DRP endpoint

2017-10-12 21:48
let me find that directory

shane
2017-10-12 21:49
doesn't matter where - we'll inject it with CLI

2017-10-12 21:49
so just put the file anywhere

shane
2017-10-12 21:50
yes - but change the "Name" value to "ubuntu-16.04" lets not override with the "global" name

shane
2017-10-12 21:50
we'll apply this profile to Machines - otherwise, all machines will get the profile (that's what "global" means)

2017-10-12 21:52
i remember some of this from back in the day - when you needed a minimum of 8gb ram to run the install

2017-10-12 21:52
file made

shane
2017-10-12 21:52
on the DRP endpoint, lets run: ``` drpcli profiles create - < stagemap-ubuntu.json ```

shane
2017-10-12 21:52
(substitute "stagemap-ubuntu.json" with whatever you named your json file on disk)

2017-10-12 21:53
thats what i used

shane
2017-10-12 21:53
:slightly_smiling_face:

shane
2017-10-12 21:54
nice - I see it there in Profiles now

shane
2017-10-12 21:54
so - that "stagemap" is the Workflow that system will go through:

shane
2017-10-12 21:55
discover node will be put in to "ubuntu-16.04-install" bootenv and installed according to that boot environment

2017-10-12 21:55
so then i bring in a new node and use the stagemap after the sledhammer process?

2017-10-12 21:56
sledgehammer*

2017-10-12 21:58
i dont see that in the system preferences

shane
2017-10-12 21:59
correct - we boot the Machine - and assuming the hardware is set up to PXE Boot - and we have it in the same Layer 2 network - or you have "ip helper" on your switch/router to pass DHCP through to your DRP endpoint

2017-10-12 22:00
just a second

shane
2017-10-12 22:00
the Machine will be initially "discovered" according to the "unknown" BootEnv (discovery)

2017-10-12 22:00
node booted

shane
2017-10-12 22:00
once it's been discovered - we should see it in Machines

shane
2017-10-12 22:01
(assuming your "Subnets" specification is also correct for the machine network)

2017-10-12 22:01
stage 2 almost done

2017-10-12 22:01
were done

shane
2017-10-12 22:01
boom !

shane
2017-10-12 22:01
now you want to edit the machine - and add the new stagemap profile we created

2017-10-12 22:02
almost every time i do that the page never finishes for me to edit

2017-10-12 22:02
cant edit

shane
2017-10-12 22:03
change the Profile to "Ubuntu-16.04" and set it "runnable"

2017-10-12 22:05
i went to overview and clicked the + for ubuntu 1604

2017-10-12 22:05
or was i to take a different path

shane
2017-10-12 22:06
Um ... I've never used the "overview" screen :slightly_smiling_face:

shane
2017-10-12 22:07
I'd make the change in the Machines screen and edit the machine - I'm not certain what "things" overview will twiddle for us

shane
2017-10-12 22:08
also - your IPMI plugin isn't actually created (under System) - so you'd have to manually reboot your node for now - until you add your IPMI user/pass credentials

2017-10-12 22:08
went to system -> machines and clicked the name but the screen just says loading

shane
2017-10-12 22:09
try again

2017-10-12 22:09
no luck

2017-10-12 22:10
i broke it

2017-10-12 22:12
i have been using chrome ...

2017-10-12 22:14
its chrome - i switched to edge and it worked no problem

shane
2017-10-12 22:21
ok - if you edit the Machine - you want to change it to: Runnable (enabled / true) Stage set to "ubuntu-16.04-install" Profiles - add "Ubuntu-16.04"

shane
2017-10-12 22:21
but we also want to fix your IPMI plugin

wdennis
2017-10-12 22:24
@shane Can one change a name of an existing profile?

2017-10-12 22:29
changed it but i am not seeing it refresh

shane
2017-10-12 22:29
@wdennis nope - you can destroy and recreate pretty easily - which is "sort of" a rename function :slightly_smiling_face:

2017-10-12 22:30
ok it updated

2017-10-12 22:36
for now to get through the exercise i rebooted the node and it is installing ubuntu 16.04

2017-10-12 22:36
uh-oh bad archive mirror

shane
2017-10-12 22:36
ok - that's a good start !

shane
2017-10-12 22:37
dang, I got happy too soon

2017-10-12 22:37
did i miss something in the subnet?

2017-10-12 22:39
it looks fine

2017-10-12 22:43
i removed the next server value as there is no other server

2017-10-12 22:44
it still hangs up on the mirror

2017-10-12 22:44
i need to stop for now

2017-10-12 22:45
need brain fuel - maybe come back to this tonight or in the morning - also need some place to help tell you enter IPMI details here

shane
2017-10-12 22:49
is 10.0.0.2 your valid DNS server ?

shane
2017-10-12 22:51
@iamjes - if you sign up for a Slack account ... we can direct message

wdennis
2017-10-12 22:51
@shane There?s a drpcli command to dump the JSON of the profile?

shane
2017-10-12 22:51
yes

shane
2017-10-12 22:51
drpcli profiles show <profile-name>

wdennis
2017-10-12 22:52
And there?s bash-completion for drpcli right?

shane
2017-10-12 22:53
you can do `drpcli profiles list | jq '.[].Name'`

shane
2017-10-12 22:53
that'll show the list of Named

shane
2017-10-12 22:53
profiles


shane
2017-10-12 22:54
you have to install the autocompletion

wdennis
2017-10-12 23:02
OK, failed to delete my profile that I want to rename? ```[dradmin@dr-admin drp]$ drpcli profiles destroy os-install-necla-defaults Error: Unable to destroy profile os-install-necla-defaults: unknown error (status 422): {resp:0xc42010c000}```

wdennis
2017-10-12 23:05
Ah, it cannot be referenced anywhere?

wdennis
2017-10-12 23:05
Needs a better error message :wink:

wdennis
2017-10-12 23:05
Once I deleted the references, the destroy worked

shane
2017-10-12 23:05
api/cli == less warm and fuzzy ...

wdennis
2017-10-12 23:06
Now to re-add it?

wdennis
2017-10-12 23:07
OK, worked

shane
2017-10-12 23:08
woot !

wdennis
2017-10-12 23:08
Hmmm, still in the UX as the old name?

shane
2017-10-12 23:08
"refresh"

shane
2017-10-12 23:08
if not that - try Shift-Reload

wdennis
2017-10-12 23:10
Yes, was a PEBKAC problem? Didn?t rename the ?Name? attrib in the JSON :stuck_out_tongue_winking_eye:

shane
2017-10-12 23:42
ah yes ... AI is going to solve the worlds PEBKAC issues !!

greg
2017-10-12 23:47
Need an issues for @wdennis issue. That is a swagger annotation error with delete of a profile.

wdennis
2017-10-12 23:48
Will open one

wdennis
2017-10-12 23:59
Now about the bash completion?

shane
2017-10-13 00:00
??


wdennis
2017-10-13 00:02
Running CentOS 7.3 - I see both `/etc/bash_completion.d/` and `/etc/profile.d/` - where is the correct target dir to put the bash completion into?

shane
2017-10-13 00:02
```. /etc/bash_completion # On Ubuntu . /etc/profile.d/bash_completion.sh # On Centos . /usr/local/etc/bash_completion # On OS X with bash 4 installed.```

wdennis
2017-10-13 00:03
Wonder what `/etc/bash_completion.d/` is for then?

shane
2017-10-13 00:04
if your linux version has that - create it as drp.bash in that directory

shane
2017-10-13 00:05
```[root@5min-drp-ewr1-00 ~]# drpcli autocomplete /etc/bash_completion.d/drp.bash [root@5min-drp-ewr1-00 ~]# exit logout Connection to 147.75.65.3 closed. shane@gala:~/5min-drp$ ssh -x -i 5min-nodes-ssh-key root@147.75.65.3 Last login: Thu Oct 12 22:12:40 2017 from http://c-69-181-139-202.hsd1.ca.comcast.net [root@5min-drp-ewr1-00 ~]# drpcli autocomplete events interfaces leases plugin_providers profiles subnets users bootenvs files isos machines plugins reservations tasks version contents info jobs params prefs stages templates```

wdennis
2017-10-13 00:27
Tried `drp.bash` as well as `drpcli` in that dir - no dice?

wdennis
2017-10-13 00:37
Ah, need to install support pkgs on CentOS/RHEL? `yum install bash-completion bash-completion-extras`

wdennis
2017-10-13 00:39
So it ended up being `sudo ./drpcli autocomplete /etc/bash_completion.d/drpcli` that worked

wdennis
2017-10-13 00:59
Have a workflow question?

wdennis
2017-10-13 01:00
How can one send existing machines into a profile?s workflow?

wdennis
2017-10-13 01:03
Say it exists in machine inventory with the ?local? bootenv? How to get it when rebooted into PXE to invoke another profile?s workflow?

2017-10-13 01:09
@rackneng wdennis - have the slack app installed

wdennis
2017-10-13 01:10
Using Slack is so much better than Gitter for this

2017-10-13 01:12
never used it -only ever used chrome

wdennis
2017-10-13 01:13
you mean Gitter in Chrome (the browser?)

2017-10-13 01:13
yes

wdennis
2017-10-13 01:14
Did you request an invite to the community channel?

2017-10-13 01:15
ive been a member here for a long time just inactive

2017-10-13 01:22
@rackneng -wdennis to answer the first question 10.0.0.x is the network for nodes

2017-10-13 01:22
10.0.0.x is on eth1 and 192.168.1.x is on eth0

2017-10-13 01:24
now that i think about there can only be one gateway when there is two cards

shane
2017-10-13 01:25
@IAMJES - you need to have a valid DNS server passed in via the DHCP Options (option 6) to resolve the default Ubuntu Mirror

2017-10-13 01:26
can i have more than one?

shane
2017-10-13 01:26
the mirror used by default (unless you override it with a parameter) is: http://us.debian.org

wdennis
2017-10-13 01:27
@shane in System > Plugins, if I click ?Add?, I see the IPMI provider; but when I click ?Use Provider?, nothing happens?

shane
2017-10-13 01:27
I think eet eez a bugz in duh seeestem

shane
2017-10-13 01:27
(beta UI)

wdennis
2017-10-13 01:28
yeah yeah yeah ? :beetle:

shane
2017-10-13 01:28
I can provide you the CLI JSON equiv.

wdennis
2017-10-13 01:29
Let?s have it

shane
2017-10-13 01:30
System --> Info & Preferences (that menu item was recently renamed)

shane
2017-10-13 01:30
DHCP options are comma separated, but haven't validated if we input and pass through that way

shane
2017-10-13 01:31
so for DHCP Option 6, in theory, you'd do: 8.8.8.8,8.8.4.4 (for example using google dns)

shane
2017-10-13 01:33
hmm ... grubbing through the Go code, I can't find a definitive answer for you offhand - but I suspect comma separated

shane
2017-10-13 01:33
if @greg is awake, he might know off the top of his head

greg
2017-10-13 01:34
I think it is but let me check

shane
2017-10-13 01:35
rfc spec is comma - and I only find models/dhcpOptions.go specifying "Multiple IP address" in the comment, but no parsing code to verify

wdennis
2017-10-13 01:37
(reposted)

greg
2017-10-13 01:39
rfc spec is comma for string specified items. The problem is that Multiple IP address are byte encoded (no comma separated). But you found the right code, @shane.

shane
2017-10-13 01:39
@wdennis edit the Machines change stage/profile and enable Runnable

wdennis
2017-10-13 01:40
OK - not sure I?m doing this workflows thang right?

greg
2017-10-13 01:40
Yeah - we need a video/docs/discussion about stages and workflows.

2017-10-13 01:40
@rackneng - no luck on the dns option

wdennis
2017-10-13 01:41
@greg Your lips to God?s ears

2017-10-13 01:41
i can ping the 10.0.0.x ip address

shane
2017-10-13 01:42
@iamjes - set your DNS option to just a single DNS for now (just in case) - and make sure you can reach it (eg from DRP Endpoint - do "host http://www.google.com") to make sure you have access to it

wdennis
2017-10-13 01:43
@shane what is the ?not runnable? (runnable not selected) state mean?

2017-10-13 01:44
shane - sorry i am not familiar with all the end points

greg
2017-10-13 01:45
@IAMJES - do your machines have one interface or two? Does the PXE booting interface have internet access?

greg
2017-10-13 01:46
I think @shane meant DRP machine you are going to install.

greg
2017-10-13 01:47
@wdennis - ?not runnable? means that the task system tried to run something and it failed. This could be because of a task error or it could be because of a bad stage map/profile setup. You can check jobs to see if there is a failed job in the list.

2017-10-13 01:47
@rackneng greg - PXE / DHCP is on eth1 for the 10.0.0.x net

greg
2017-10-13 01:47
Okay - does 10.0.0.x route out to the internet?

2017-10-13 01:49
checking

wdennis
2017-10-13 01:50
@greg So in the UX, in Systems > Machines, there is a ?State? column, with either a green :white_check_mark: or a black ?power switch? symbol

wdennis
2017-10-13 01:50
If I edit a node, I see a slider switch widget, that sets the node to Runnable (or not)

wdennis
2017-10-13 01:51
If ?runnable? is set ?off?, what does that do/mean?

greg
2017-10-13 01:52
runnable off turns the black power switch on in the UX. it means that something in the task system needs to be examined.

greg
2017-10-13 01:54
The runnable flag is set to true by users or if the node runs `drpcli machines processjobs`. This is usually done as part of a bootenv (like sledgehammer or the install bootenvs).

greg
2017-10-13 01:54
It is meant to be an indicator that something might be amiss.

greg
2017-10-13 01:54
If a job fails, process jobs will wait until it becomes runnable again and retry the jobs.

wdennis
2017-10-13 01:55
OK? It seems that when I?ve installed a node, and the bootenv goes to ?local?, then the ?runnable? switch is off

wdennis
2017-10-13 01:57
What I?m trying to do is take a node that?s in a ?local? bootenv (i.e., was previously installed), and set it to reinstall and trigger a workflow for the profile that I?ve assigned to it

greg
2017-10-13 01:57
okay - so apply the profile to the machine that as the workflow in it.

greg
2017-10-13 01:58
Set the machine?s stage to the first stage in your workflow (like discover).

greg
2017-10-13 01:58
Reboot the node (so it PXE boots).

wdennis
2017-10-13 01:58
What I?ve done is edit the node, set the bootenv to (for example) `ubuntu-16.04-install`, set the profile to the desired one, and then set the stage to the first stage in the workflow

greg
2017-10-13 01:59
If you are using stages, you don?t need to deal with bootenvs directly. Stages imply bootenvs.

greg
2017-10-13 02:00
That should work assuming you have a starting stage of like `ubuntu-16.04-install`

wdennis
2017-10-13 02:01
Yes, I see that now - if I select a stage of `ubuntu-16.04-install`, the bootenv is set to the same and not edittable

greg
2017-10-13 02:01
This is good to see you guys use this and ask questions. I?m tweaking in now for 3.2. I?m going to try and make this more explicit.

wdennis
2017-10-13 02:01
Yes, the workflow has a starting stage of `ubuntu-16.04-install`

greg
2017-10-13 02:02
So, set the machine?s stage to that. Make sure the machine has a profile with a `change-stage/map` parameter (or in global). Then PXE boot the machine.

wdennis
2017-10-13 02:03
I will check the profile

wdennis
2017-10-13 02:03
BTW, what?s the IPMI plugin for (do)?

wdennis
2017-10-13 02:04
I activated it, but then in UX System > Plugins, can?t seem to Add > Use Provider

greg
2017-10-13 02:05
IPMI plug does IPMI calls to manage the bare metal machines.

greg
2017-10-13 02:05
It needs a node to have a couple of parameters to function.

wdennis
2017-10-13 02:06
Love to be able to use it?

wdennis
2017-10-13 02:06
Yes, my node?s profile does have the stage map: ``` "change-stage/map": { "ssh-access": "complete-nowait:Success", "ubuntu-16.04-install": "ssh-access:Success" }, ```

greg
2017-10-13 02:07
okay - looks good.

wdennis
2017-10-13 02:08
I?m using some old Dell PE 860's to test with, so they tend to take a while to PXE -> install

2017-10-13 02:08
Time to feed the :bear:!

wdennis
2017-10-13 02:08
I?m not on site now, so hard to know what?s going on with them once I set to PXE next boot & restart via IPMI?

greg
2017-10-13 02:10
yeah ?.

wdennis
2017-10-13 02:10
So was hoping that the IPMI plugin would let me set the nodes to PXE next boot and restart?

greg
2017-10-13 02:10
It can

greg
2017-10-13 02:11
THere is a content package that goes with it.

greg
2017-10-13 02:11
It can be used to configure the BMC.

wdennis
2017-10-13 02:11
Currently doing it by: ```[dradmin@dr-admin ~]$ ipmitool -I lan -H testnode01-ipmi -U root -a chassis bootparam set bootflag force_pxe Password: Set Boot Device to force_pxe [dradmin@dr-admin ~]$ ipmitool -I lan -H testnode01-ipmi -U root -a chassis power cycle Password: Chassis Power Control: Cycle```

greg
2017-10-13 02:11
and set the parameters needed by the plugin.

greg
2017-10-13 02:12
it does those commands - well we use a slightly different one for the top one.

greg
2017-10-13 02:13
```drpcli plugin_providers list``` I think will show you the actions that will get added to a machine whtn the plug provider is configured with a plugin and the machine has the required parameters.

greg
2017-10-13 02:14
```{ "AvailableActions": [ { "Command": "poweron", "OptionalParams": null, "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }, { "Command": "poweroff", "OptionalParams": null, "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }, { "Command": "powercycle", "OptionalParams": null, "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }, { "Command": "nextbootpxe", "OptionalParams": null, "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }, { "Command": "nextbootdisk", "OptionalParams": null, "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }, { "Command": "identify", "OptionalParams": [ "ipmi/identify-duration" ], "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] } ], "Name": "ipmi", "OptionalParams": null, "Parameters": [ { "Available": true, "Description": "IP Address of the BMC", "Documentation": "This parameter is used by the IPMI Plugin to access the BMC", "Errors": [], "Meta": { "color": "blue", "icon": "address card outline", "title": "RackN Content" }, "Name": "ipmi/address", "ReadOnly": false, "Schema": { "type": "string" }, "Validated": true }, { "Available": true, "Description": "Username to access the BMC", "Documentation": "This parameter is used by the IPMI Plugin to access the BMC", "Errors": [], "Meta": { "color": "blue", "icon": "user circle", "title": "RackN Content" }, "Name": "ipmi/username", "ReadOnly": false, "Schema": { "type": "string" }, "Validated": true }, { "Available": true, "Description": "Password to access the BMC", "Documentation": "This parameter is used by the IPMI Plugin to access the BMC", "Errors": [], "Meta": { "color": "blue", "icon": "lock", "password": "hideme", "title": "RackN Content" }, "Name": "ipmi/password", "ReadOnly": false, "Schema": { "type": "string" }, "Validated": true }, { "Available": true, "Description": "Duration in seconds to leave the identify light on", "Documentation": "Duration in seconds to leave the identify light on", "Errors": [], "Meta": { "color": "blue", "icon": "podcast", "title": "RackN Content" }, "Name": "ipmi/identify-duration", "ReadOnly": false, "Schema": { "type": "integer" }, "Validated": true } ], "RequiredParams": null, "Version": "v1.0.0-0-3c742d9c049e008ad86d3c1cf2b420e44318bc9f" }```

greg
2017-10-13 02:15
That monstrosity is the ipmi plugin provider?s definition. It shows the parameters that are needed to control it and the actions that machines get.

wdennis
2017-10-13 02:15
Yeah, will have to try the `nextbootpxe` and `powercycle` ones

greg
2017-10-13 02:16
For example, add `ipmi/address`, `ipmi/username`, `ipmi/password` to a machine as parameters (with good values) and the actions should show up in the UX and CLI. Then you can call them from the UX or CLI to drive those actions.

wdennis
2017-10-13 02:17
Do you know if the command should be `chassis bootparam set bootflag nextbootpxe`? Or something other?

wdennis
2017-10-13 02:18
Nope:

wdennis
2017-10-13 02:18
```[dradmin@dr-admin ~]$ ipmitool -I lan -H testnode02-ipmi -U root -a chassis bootparam set bootflag nextbootpxe Password: Invalid argument: nextbootpxe```

greg
2017-10-13 02:18
we use ```ipmitool <flags to connect> chassis bootdev pxe```

greg
2017-10-13 02:18
There are additional flags to make it persistent, but we are mostly just trying to get the next boot to be pxe.

wdennis
2017-10-13 02:19
Yup, that worked

wdennis
2017-10-13 02:19
What?s your reboot one?

wdennis
2017-10-13 02:20
`chassis power cycle`?

greg
2017-10-13 02:21
yes

wdennis
2017-10-13 02:21
Cool, thx

wdennis
2017-10-13 02:21
I don?t see any IPMI-related stuff in Content Packages?

wdennis
2017-10-13 02:25
Dude? I think I?m getting the workflow / stages thing now? :grinning:

greg
2017-10-13 02:26
Good - I?m about to change. :slightly_smiling_face:

wdennis
2017-10-13 02:27
First node is in Stage ?complete-nowait? and BootEnv ?local?

wdennis
2017-10-13 02:27
Let?s log in and take a look?

greg
2017-10-13 02:27
that looks good. That looks like a success.

wdennis
2017-10-13 02:28
So wait wat - you are going to change the way Stages work now?? :cry:

greg
2017-10-13 02:28
A little bit. There are some issue with stages and workflows that we should fix before we get to much further along. The concepts will be the same and names won?t change. The map is going to change a little.

greg
2017-10-13 02:29
I promise I?ll document it just as good as the current scheme. :slightly_smiling_face:

wdennis
2017-10-13 02:29
lol

wdennis
2017-10-13 02:29
What is the sound of one hand clapping?

greg
2017-10-13 02:30
:slightly_smiling_face:

greg
2017-10-13 02:30
These are the changes we talked about in the community meetup around stage changes.

wdennis
2017-10-13 02:31
Yes - t?would be great to have the Workflow stage map actually be graph-like?

wdennis
2017-10-13 02:31
Alpha-ordering the stages is a bit confusing

greg
2017-10-13 02:33
That is part, Differentiating different workflows in the same map space. Tracking the current workflow in process. These are nice usability changes that are coming. There are more subtle issues that we are addressing as well.

wdennis
2017-10-13 02:34
Tracking workflow in progress - W00T!

greg
2017-10-13 02:34
The success path and task failure paths through stages are fine and safe, but the random failure cases aren?t handled completely safe.

wdennis
2017-10-13 02:34
Kind of like the v2 checkmarks thing?

greg
2017-10-13 02:35
Kinda - in a very gross way, you could use stages as roles and workflows as deployments, but that isn?t really a completely accurate analogy.

greg
2017-10-13 02:36
The intent is to make workflows a parameter-based thing so it is all content driven.

wdennis
2017-10-13 02:36
I really can?t wait until you guys document this ? very powerful, but you gotta know how to use it?

greg
2017-10-13 02:36
The random failure cases are if machines randomly reboot (power outage, hurricane, ?) and catch machines mid cycle.

wdennis
2017-10-13 02:37
Yup, ?inconsistent state?

greg
2017-10-13 02:37
exactly. The stages can have issues recovering from that.

greg
2017-10-13 02:37
A couple of tweaks and a few more pieces of actions will allow both stages and workflows to handle those cases as well.

shane
2017-10-13 02:52
@ctrees - I've updated the examples/5min-drp/ stuff ... however I think it's broken a bit due to some terraform provider plugin breakage by either Terraform or http://Packet.net - I'm not sure at the moment ... and I'm toast for the evening. I'll pick it back up again tmw morning. Please check out the README though - as it's updated with the "new way" of getting things, and AUTHing to get content - you no longer have to stage the RackN DRP Plugin content ...

2017-10-13 12:40
Morning all, anyone have a sample interfaces i can look at? My ubuntu install keeps stopping and saying it cant reach the mirror

2017-10-13 12:40
# The primary network interface auto enp2s0 iface enp2s0 inet static address 192.168.1.188 netmask 255.255.255.0 gateway 192.168.1.204 dns-nameservers 10.0.0.2 192.168.1.204 # secondary eth auto enxd8eb97bf66bc iface enxd8eb97bf66bc inet static address 10.0.0.2 netmask 255.255.255.0

ctrees
2017-10-13 13:38
@shane - Thanks... I'll start with the README now!

shane
2017-10-13 13:46
@iamjes - secondary eth looks right - you have your DHCP Options set for Default Gateway (Option 3) to be the DRP server for your provisioned machines - is your DRP endpoint routing traffic for your provisioned nodes ?

ctrees
2017-10-13 14:05
@shane ? missing " ? export BASE=htttps://.." sb> export BASE="https:...download" ?? correct ??

2017-10-13 14:06
@rackneng - shane it isnt so i just finished working on bridges and see what happens

shane
2017-10-13 14:10
@ctrees the RACKN_AUTH variable is set w/ `?`: `RACKN_AUTH="?username=${RACKN_USENAME}"`

ctrees
2017-10-13 14:18
@shane my comment was about a missing " in the README... is the "Download RackN plugins content and state it in the private-contents' a 'manual prep' or in the demo-run.sh ?

shane
2017-10-13 14:20
ah - I see :slightly_smiling_face:

ctrees
2017-10-13 14:20
so is the RACKN_AUTH also an export that needs set ? ...

shane
2017-10-13 14:20
nope - that's internal to bin/control.sh

shane
2017-10-13 14:20
demo-run.sh drives most of the actions through bin/conrol.sh

ctrees
2017-10-13 14:22
ok... well what I was confused about is if the lines after the export BASE are really part of the serial blob or reference to how it gets expanded...

ctrees
2017-10-13 14:23
the " is sort of either in the middle or end or I am not following

shane
2017-10-13 14:25
hmm - sorry - let me double check things - you don't need to download the RackN plugin content - not sure how I left that over in the README - checking

shane
2017-10-13 14:26
ugh - I have a git fail somewhere - I don't seem to have pushed the right README version in place :disappointed:

2017-10-13 14:28
so for now i am still failing on the ubuntu install and i get this in the log ... dr-provision2017/10/13 14:23:12.866150 sending block 0: code=0, error: TFTP Aborted

2017-10-13 14:29
i am still getting the bad mirror error after i have redone my networking to bridge. this next part is a little long so the only other thing i can think of is make a new mapping using debian...

2017-10-13 14:29
# The loopback network interface auto lo iface lo inet loopback # The primary network interface auto enp2s0 iface enp2s0 inet manual # secondary network interface auto enxd8eb97bf66bc iface enxd8eb97bf66bc inet manual # now start to bridge primary auto br0 iface br0 inet static address 192.168.1.188 netmask 255.255.255.0 gateway 192.168.1.204 dns-nameservers 192.168.1.204 bridge_ports enp2s0 bridge_stp off bridge_fd 0 bridge_maxwait 0 # secondary bridge auto br1 iface br1 inet static address 10.0.0.2 netmask 255.255.255.0 broadcast 10.0.0.255 bridge_ports enxd8eb97bf66bc bridge_stp off bridge_fd 0 bridge_maxwait 0

shane
2017-10-13 14:31
@iamjes - isn't your goal just to route the Provisioned Machines _through_ your DRP Endpoint - making it your "default router" for those Machines ?

shane
2017-10-13 14:33
@ctrees - ignore the RackN README plugin download stuff - it's now all handled in bin/control.sh - I'm fixing readme now

shane
2017-10-13 14:33
some how I stomped over a README change last night

2017-10-13 14:34
@rackneng shane it is though i dont have any other ideas except to try debian

shane
2017-10-13 14:34
ok - you don't need bridges to do that

shane
2017-10-13 14:34
Linux will route for you

shane
2017-10-13 14:34
what Distro are you using for your DRP Endpoint ?

2017-10-13 14:35
ubuntu 16.04


2017-10-13 14:35
i just made a stagemap for debian and if it works then its the distro

shane
2017-10-13 14:36
basically - start at the "Enable IP forwarding" section in that web page (IP Forwarding)

shane
2017-10-13 14:37
if you have something else upstream that is NAT translating your 10.0.0.0/24 network for you - then you do not need the IP masquerading

2017-10-13 14:41
@rackneng - shane - i made a debian stagemap and i kept the bridging the same. debian7 is downloading now -

2017-10-13 14:46
it just quit the location of the mirrors changed i think

shane
2017-10-13 14:55
ok - @ctrees README is updated - and there were some fixes to the bin/control.sh earlier this morning - so I suggest you re-pull the entire content just to be safe

shane
2017-10-13 14:55
I'm walking through the process right now to validate it's all correct

2017-10-13 15:04
@rackneng shane- the way the his configuration reads enp2s0 would be eth0 and enxd8eb97bf66bc would be eth1 - then... i would have something like this?

2017-10-13 15:04
auto enp2s0 iface enp2s0 inet static address 192.168.1.188 netmask 255.255.255.0 gateway 192.168.1.204 dns-nameservers 192.168.1.204 # secondary inteface auto enxd8eb97bf66bc iface enxd8eb97bf66bc inet static address 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255 gateway 192.168.1.2

2017-10-13 15:06
and i removed the comment from the line it talks about for ipv4

ctrees
2017-10-13 16:05
-------------------------------------------------------------------------------- ACTION :: export DRP=bfaa26ef-8b14-450b-83ba-cc5421468a0f Run next step? [ <Enter> | No | Ctrl-C ] -------------------------------------------------------------------------------- Success... /Users/cat/CodeOps/5min-drp/bin/control.sh: line 456: jq: command not found -------------------------------------------------------------------------------- ACTION :: export ADDR= Run next step? [ <Enter> | No | Ctrl-C ]

shane
2017-10-13 16:05
um

shane
2017-10-13 16:05
I guess I assume you have `jq` installed locally :slightly_smiling_face:

ctrees
2017-10-13 16:05
Seems like the ADDR did not pick up ?

ctrees
2017-10-13 16:06
the script actually asked packet for the server... and it is 'on-line' now

ctrees
2017-10-13 16:06
Should I go export the ADDR (IP I assume) and let the script continue ?

ctrees
2017-10-13 16:07
... woop that session will not pick up the export...

shane
2017-10-13 16:07
you can cancel "demo-run.sh" any time - restart it, then just answer "N" to the previous items run already

shane
2017-10-13 16:07
correct

ctrees
2017-10-13 16:08
... should I just re-run.... it could be an aritfact of 'clean project' ??

shane
2017-10-13 16:08
I need to see why the "prereqs()" failed - it checks for `jq`

shane
2017-10-13 16:08
nope - I regularly run this as a "clean" project

ctrees
2017-10-13 16:09
not sure about jq ... it's on a macmini (my messy machine)

shane
2017-10-13 16:09
what does: `which jq` return ?

ctrees
2017-10-13 16:09
I was going to build it on a clean CentOS VM desktop... BUT I have fought the time synce with vbox...

ctrees
2017-10-13 16:10
your right, it's not installed

shane
2017-10-13 16:10
the tooling generally works very hard to keep your environment contained in the 5min-drp directory - the only caveat I can recall is the requirement to modify the `~/.terraformrc`

shane
2017-10-13 16:10
and my handling of that mod can get it in a confused stage - not enough "idempotency" around the mod/restore of that file

2017-10-13 16:11
@rackneng - shane - i made a new configuration and the install stills comes back as a bad mirror

shane
2017-10-13 16:11
@ctrees I'm guessing you ignore an earlier error message ?

shane
2017-10-13 16:11
``` case $_OS_FAMILY in rhel) sudo yum -y install $_pkgs; xit $? ;; debian) sudo apt -y install $_pkgs; xit $? ;; darwin) ;; *) xiterr 4 "unsupported _OS_FAMILY ('$_OS_FAMILY') in prereqs()" ;;```

shane
2017-10-13 16:11
you should have seen "unsupported..."

ctrees
2017-10-13 16:11
OH... on the terraform, I just put that bin into 5min-drp/bin/

2017-10-13 16:11
i ran a dig command and it looks like it worked rebar@rebar:~$ dig 192.168.0.10 google.com ; <<>> DiG 9.10.3-P4-Ubuntu <<>> 192.168.0.10 google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11420 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;192.168.0.10. IN A ;; ANSWER SECTION: 192.168.0.10. 86400 IN A 192.168.0.10 ;; Query time: 4 msec ;; SERVER: 192.168.1.204#53(192.168.1.204) ;; WHEN: Fri Oct 13 11:09:04 CDT 2017 ;; MSG SIZE rcvd: 57 ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62907 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;google.com. IN A ;; ANSWER SECTION: google.com. 155 IN A 172.217.9.174 ;; Query time: 4 msec ;; SERVER: 192.168.1.204#53(192.168.1.204) ;; WHEN: Fri Oct 13 11:09:04 CDT 2017 ;; MSG SIZE rcvd: 55

2017-10-13 16:15
hi there guys!

2017-10-13 16:16
can some one show me how must look proper bootenv file with ssh-key

2017-10-13 16:17
```"OptionalParams": [ "access-keys", "kernel-console" ],```

2017-10-13 16:17
right?

2017-10-13 16:21
my goal is boot sledgehammer and able to access with pub key

2017-10-13 16:32
i'm reviewed the Digital Rebar Online Meetup 2 on youtube, but Victor show only installation part, and is too many dependency from control to profiles for me, witch looks like magic

shane
2017-10-13 16:39
@kolomnitcki - we're rolling up a short more concise vid in a little bit

shane
2017-10-13 16:39
the basic method is to modify the "global" profile

2017-10-13 16:48
ok ill try play with global and will not distract today =)

zehicle
2017-10-13 16:48
one sec... it's buried in a video

2017-10-13 16:53
@kolomnitcki here's the link https://youtu.be/pHp6cHF11IM?t=371

2017-10-13 16:54
I thought it was on the v3.1 playlist already but it was not. Corrected that

2017-10-13 16:55
that's exactly what you were asking about

2017-10-13 16:55
ty

2017-10-13 18:14
Afternoon, how or where do i go to sign up for some of the 'pay services' for DR / RackN ?

2017-10-13 18:15
@IAMJES I'll work with you 1x1 to get connected

2017-10-13 18:15
@IAMJES in general, we direct people to rackn.com/beta

2017-10-13 18:16
@zehicle - Ok thanks! I am working through a network issue at the moment so i am almost ready

2017-10-13 18:19
@zehicle is the quickstart script still valid?

2017-10-13 18:20
it should be - we keep it updated.

2017-10-13 18:21
the instructions inside the script are usually right if in doubt

2017-10-13 18:22
v3 (DRP) quickstart only. We (RackN) is not maintaining v2 for new users

johnsutten
2017-10-13 18:28
has joined #json

2017-10-13 18:43
Just curious, which os is most of you using to deploy DRP ?

shane
2017-10-13 19:05
@johnsutten - Linux :slightly_smiling_face:

johnsutten
2017-10-13 19:06
Centos / Ubuntu / Red Hat / Oracle Enterprise Linux ?

shane
2017-10-13 19:06
centos/ubuntu are the two most popular at RackN

shane
2017-10-13 19:07
I use centos/ubuntu equally - the Linux distro really doesn't matter much

shane
2017-10-13 19:07
we'd suggest a modern version (eg centos 7 or ubuntu 16) that uses systemd - but that's not really required

shane
2017-10-13 19:08
the DRP endpoint is a Go Lang binary - there are very very few external dependencies to operate DRP (by design)

2017-10-13 21:15
@kolomnitcki focused SSH key add video: https://youtu.be/StQql8Xn08c

johnsutten
2017-10-13 21:26
do we have any experts in here on networking? I have tried all kinds of things and I still am not able to get eth0 and eth1 to work together

2017-10-13 21:41
@zehicle thank you Rob, it works after you point me prev video. but now i'm powerup my VMs and got "Permission denied (publickey)" very strange thing.

shane
2017-10-13 21:56
@kolomnitcki - when you ssh to the provisioned machine - can you please add "-v" to your SSH options, and provide that output here ?

2017-10-13 21:59
https://pastebin.com/zM5GVLvF

2017-10-13 22:02
for now, i'm back to clear VM snapshot and will install drp again, maybe i broked something when play

shane
2017-10-13 22:08
first, the "-T" option says do not allocate a psuedo TTY - which means you can't log in ... though - that is not the problem here, if the command succeeded, you wouldn't get a shell

shane
2017-10-13 22:09
Please check that the SSH **private** key half to the public key has appropriate permissions (chmod 600 FILE) - second - please verify you are using the __correctly__ private key half when you connect to your provisioned host - the one that matches the public key half you put in the parameter

shane
2017-10-13 22:10
in this case - the private key half is `/home/stanislav/.ssh/id_rsa` - so you need to be using the public half in the Parameter (presumably it's the `/home/stanislav/.ssh/id_rsa.pub` file)

2017-10-13 22:16
its ok, its just ubuntu happen. clear vm dont want to connect too. fix after reboot

2017-10-13 22:27
now my goal is find way to boot FreeBSD live img, where i can read about ``explode_iso.sh``?

shane
2017-10-13 22:28
you can read the shell script - located in (isolated mode) in drp-data/tftpboot/explode_iso.sh, or production mode in /var/lib/dr-provision/tftpboot/explode_iso.sh

shane
2017-10-13 22:29
usage options are basically (as found in the script): ```echo "Explode iso $1 $2 $3 $4" os_name="$1" tftproot="$2" iso="$3" os_install_dir="$4" expected_sha="$5"```

shane
2017-10-13 22:30
here's an example I documented for exploding Centos 7 ```export ISO_DIR=/home/vagrant/drp-data/tftpboot sudo $ISO_DIR/explode_iso.sh \ ce-centos-7.3.1611-install \ $ISO_DIR \ $ISO_DIR/isos/CentOS-7-x86_64-Minimal-1611.iso \ $ISO_DIR/centos-7.3.1611/install```

2017-10-13 22:37
who call it? he execute when ``drpcli bootenvs uploadiso``?

shane
2017-10-13 22:38
yes, or if a new ISO exists in the tftpboot/isos/ directory - the dr-provision daemon will do it on restart, or on -HUP signal (eg `kill -1 PID` or `kill -HUP PID` process of dr-provision)

2017-10-13 22:42
ok, but for testing i need ``memdisk`` from ``syslinux-6.03`` and i dont want to create complicate arch like sledgehammer for now. can i just create some directoris and put files, when create bootenv.json without ``IsoUrl`` directive?

shane
2017-10-13 22:46
presumably - yes you can - but I haven't tested / tried this ... YMMV

2017-10-14 00:26
whell, i cant able to debug bootenv with FreeBSD img, ``sending block 0: code=0, error: TFTP Aborted`` - its maximum information what i got with ``--debug-bootenv=2``

wdennis
2017-10-14 01:52
@johnsutten I may be able to help - please describe your network topology- you have two private (RFC1918) networks I see, 192.168.1.0/24 and 10.0.0.0/24... which one has Internet connectivity?

johnsutten
2017-10-14 02:10
@wdennis may have it solved - will know in a moment

johnsutten
2017-10-14 02:14
Nope this was my latest attempt

johnsutten
2017-10-14 02:14
# The primary network interface auto eth0 iface eth0 inet static address 192.168.1.188 netmask 255.255.255.0 gateway 192.168.1.204 dns-nameservers 192.168.0.1 192.168.1.204 #secondary network auto eth1 iface eth1 inet static address 192.168.0.1 netmask 255.255.255.0 network 192.168.0.0 broadcast 192.168.0.0 post-up route add -net 192.168.0.0 netmask 255.255.255.0 gw 192.168.1.204

johnsutten
2017-10-14 02:15
@wdennis I will remove the post-up line

johnsutten
2017-10-14 02:17
eth0 is connected to the internet eth1 serves the nodes

wdennis
2017-10-14 14:52
@IAMJES Then two questions: 1) does the 192.168.1.0/24 network on eth0 have a router (should be the ?gateway? IP device) that does NAT? 192.168.x.x addresses aren?t routable on the Internet 2) Is the DRP node acting as the router between the nodes network (192.168.0.0/24 in your example above) and the Internet-connected network (192.168.1.0/24 in your example above) ?

wdennis
2017-10-14 15:00
If the DRP server can curl stuff from the DR repo, then #1 should be a ?yes? (i.e. the DRP server has working Internet connectivity and also working DNS (can resolve names into IP addresses)

wdennis
2017-10-14 15:01
If the above is *not* working, that?s step one to resolve and get working

wdennis
2017-10-14 15:18
If it *is* working, please run the following commands on the DRP server and post the output: 1) ip route show 2) sysctl net.ipv4.ip_forward

ctrees
2017-10-14 16:28
Say... I see: scientificlinux-6.8-install ... does anybody know if another (probably university IT geek) is working on OpenAFS server infrastructure scripts ? (That's my goal is to setup testing env for OpenAFS Disaster Recovery)

ctrees
2017-10-14 16:32
global 'interest' in OpenAFS has reduced since CERN is trans off... but I'm curious who was motivated to put scientificlinux-6.8-install in

2017-10-14 17:05
> *[ctrees]* Say... I see: scientificlinux-6.8-install ... does anybody know if another (probably university IT geek) is working on OpenAFS server infrastructure scripts ? (That's my goal is to setup testing env for OpenAFS Disaster Recovery) i know in https://en.wikipedia.org/wiki/Faculty_of_Biology_(Moscow_State_University) cluster use scintific linux, but they use ``Lustre`` file system

johnsutten
2017-10-15 00:38
@wdennis class@class:~$ ip route show default via 192.168.1.204 dev eth0 192.168.0.0/24 dev eth1 proto kernel scope link src 192.168.0.1 192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.188

johnsutten
2017-10-15 00:38
class@class:~$ sudo sysctl net.ipv4.ip_forward net.ipv4.ip_forward = 1

wdennis
2017-10-15 01:07
@johnsutten OK, that looks fine - you have routing between interfaces turned on in the kernel, so the DRP box is acting as a router for the nodes network.

wdennis
2017-10-15 01:08
Routing table looks OK as well...

wdennis
2017-10-15 01:10
Is the upstream router at 192.168.1.204 doing NAT, or something beyond that?

wdennis
2017-10-15 01:11
Could you do show the output of this command: `traceroute 8.8.8.8`

wdennis
2017-10-15 01:11
(from the DRP server)

johnsutten
2017-10-15 01:51
traceroute 8.8.8.8 traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets 1 192.168.1.204 (192.168.1.204) 0.571 ms 0.623 ms 0.735 ms 2 47.187.64.1 (47.187.64.1) 6.802 ms 6.818 ms 6.829 ms 3 172.102.49.250 (172.102.49.250) 6.876 ms 6.839 ms 6.942 ms 4 http://ae7---0.scr01.dlls.tx.frontiernet.net (74.40.3.17) 6.939 ms http://ae8---0.scr02.dlls.tx.frontiernet.net (74.40.3.25) 6.485 ms 6.646 ms 5 http://ae1---0.cbr01.dlls.tx.frontiernet.net (74.40.1.82) 6.878 ms 7.765 ms http://ae0---0.cbr01.dlls.tx.frontiernet.net (74.40.4.14) 19.332 ms 6 74.40.26.234 (74.40.26.234) 8.335 ms 6.543 ms 6.604 ms 7 108.170.240.193 (108.170.240.193) 7.265 ms 3.002 ms 108.170.252.129 (108.170.252.129) 4.349 ms 8 209.85.248.171 (209.85.248.171) 3.306 ms 108.170.230.145 (108.170.230.145) 3.131 ms 216.239.62.77 ) 4.259 ms 9 http://google-public-dns-a.google.com (8.8.8.8) 4.580 ms 7.319 ms 7.354 ms

wdennis
2017-10-15 02:07
@johnsutten OK, assuming that the upstream router at 192.168.1.204 is doing NAT, since you are reaching things on the Internet

wdennis
2017-10-15 02:08
OK, now the big question is: Is the upstream router NAT-ing *all* private IPv4 addresses, or just the 192.168.1.0/24 ones?

wdennis
2017-10-15 02:09
Do you have any working nodes on 192.168.0.0/24 that we can test from? (other than the DRP server of course)

johnsutten
2017-10-15 13:20
@wdennis I can put a node on with sledgehammer I haven?t setup ssh yet I think...

johnsutten
2017-10-15 14:55
@wdennis I also have a spare router I can connect and make the network so the server owns 10.0.0.x

wdennis
2017-10-15 15:23
@johnsutten We need a device you can log into on the nodes network

johnsutten
2017-10-15 15:26
@wdennis I can do that here... Secondary router takes on a new network and connect a laptop to it and set the router. The router is connected directly to the internet and I can put the routers IP address as fixed and in DMZ

johnsutten
2017-10-15 15:28
this way 'class' only has to deal with one network and subnet instead of two

wdennis
2017-10-15 16:08
And then I guess the DR server (?class??) has a leg on both networks, but does not have to act as a router... right?

johnsutten
2017-10-15 16:29
@wdennis no it doesn't though i do like the idea of having refined control on this network

wdennis
2017-10-15 16:30
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F7JJB7DT6/image_uploaded_from_ios.png and commented: My understanding of your latest topology...

johnsutten
2017-10-15 16:43
@johnsutten uploaded a file: https://rackn.slack.com/files/U7J7U5DA9/F7HT27E4Q/network1.png and commented: Modified network layout

wdennis
2017-10-15 17:17
@johnsutten if you do something like that, make sure that the border router does NAT for both the 192.168.1.0/24 as well as the 10.0.0.0/24 networks

johnsutten
2017-10-15 17:19
my cell phone and this machine is in the 192.168.1.x network and no one has yelled at me yet

johnsutten
2017-10-15 17:22
ill reboot this machine and let you know if i see any issues

johnsutten
2017-10-15 17:26
@wdennis looks like everything is doing what it is supposed to do.

johnsutten
2017-10-15 17:30
node is installed, ill tighten up ports accordingly...

2017-10-15 20:30
is someone already wrote ansible role for drp delpoy?

2017-10-15 20:46
today i wrote only download logic and dont feel is right way ```1) absetn dr-provision.sha256 from /tmp 2) present dr-provision-releases directory 3) download dr-provision.sha256 to /tmp 4) check if dr-provision.zip exist 5) check sha256 dr-provision.zip when dr-provision.zip not exist of sha256sum mismatch -- need to update or this is new install``` too many checks only for start download

2017-10-15 20:49
@kolomnitcki check the Terraform that Shane did in https://github.com/digitalrebar/provision/tree/master/examples/5min-drp

2017-10-15 20:50
it already sounds like you are using the tools/install.sh script as a reference (which is what I'd recommmend)

2017-10-15 20:56
i have some issues with install.sh, like 7z in centos7 is - 7za

johnsutten
2017-10-15 22:02
@wdennis Thank you for all of your help!

wdennis
2017-10-15 22:15
@johnsutten n/p, glad it?s working for you now

johnsutten
2017-10-15 22:26
Not to worry I will need to find and update the ssh logins for the default Ubuntu instances and get into how to deploy openstack

zehicle
2017-10-15 22:28
likely @rstarmer and @carl have an opinion on that

johnsutten
2017-10-15 22:28
@zehicle ok thanks

zehicle
2017-10-15 22:28
FWIW, I'd recommend looking into the Ansible or Kolla work. You could feed the Ansible from the dynamic inventory integration

zehicle
2017-10-15 22:29
is heads down making a new UX feature for bulk editing

wdennis
2017-10-15 23:30
@zehicle Any plans on DRP supporting _running_ an Ansible playbook from the DRP endpoint?

wdennis
2017-10-15 23:30
(i.e. as a workflow stage?)

zehicle
2017-10-16 00:03
@wdennis like v2 used to do? would be possible as a plug-in. one missing concept is that DRP has no SSH user context at all.

zehicle
2017-10-16 00:04
to do it from DRP server, you'd need a plug-in. You could do it from a stage on a single node (aka Ansible local) basis. That would not cross nodes, but could use Ansible to do the node config. Would be the same for chef solo or puppet

wdennis
2017-10-16 13:28
single-node would work for now? It?s basically for post-install config for a given purpose which would be based on profile(s)

wdennis
2017-10-16 13:30
So would need the kickstart/preseed to prep for running Ansible I guess? Then the post-install stage would download a playbook from Git and run it locally

wdennis
2017-10-16 13:33
I?m aiming for a single-pass install process, instead of my current ?install the node (which is remote Ansible-ready) with one tool, then run Ansible from another system on the newly-installed node?

ctrees
2017-10-16 13:57
@wdennis @zehicle is what your talking about using the same process as the kubespay demos ? as another content package ?

ctrees
2017-10-16 14:04
your current process allows you to pass off the infrastructure to 'another node'... I was thinking of doing that sort of process to hand off to CI/CD... so I was curious of the 'motivation' for 'single-pass' install as I seem to be getting into that debate with our group right now... aka figuring out the best life-cycle pattern for H/W recycle, upgrade... blah blah...

ctrees
2017-10-16 14:09
I've been promoting a 'scorched earth' method just because that forces us to practice DR, but we have so much legacy custom code...

shane
2017-10-16 14:09
"immutable infrastructure" might be a better phrase ... :slightly_smiling_face:

shane
2017-10-16 14:10
DRP is designed to support that - particularly with the read-only layers of content - helps to prevent unintentional (or intentional) changes in the field to provisioning templates/etc. to help insure repeatable deployments without any drift

ctrees
2017-10-16 14:11
... naw, 'scorched earth' put's it into a good frame of mind for debate... if I had to 'sell', I'd use 'immutable infra' :wink:

shane
2017-10-16 14:12
the hard part of that is application data and state - you have to be able to support application data separation from the app itself ... or excruciatingly carefully planned templates/profiles/etc. to not destroy content on system when provisioning

wdennis
2017-10-16 14:13
@ctrees I want my OS installer to be able to trigger an Ansible run (either remote or local) against the node(s) it just installed.

ctrees
2017-10-16 14:15
@wdennis OH... how did rob trigger the kubespray stuff ? ... wait... he had a ui step in that... your going full auto ?

zehicle
2017-10-16 14:16
@ctrees I've been using "lather, rinse, repeat" but the scorched earth matches the "create, destroy, recreate" approach from cloud

zehicle
2017-10-16 14:17
@ctrees it's not triggered - it's a stand alone step. There is a DRPCLI wait for X step, so you could build a script

shane
2017-10-16 14:17

zehicle
2017-10-16 14:17
that would get machines ready then wait for them to hit a complete state and then run the ansible-playbook command

ctrees
2017-10-16 14:18
btw... I'm sure I can't do scorched earth in prod, but want to push the idea into CI/CD for sure...

ctrees
2017-10-16 14:19
Yup that's the demo I'm referring to

wdennis
2017-10-16 14:19
@ctrees I believe Rob ran it outboard like I do - excepting they have provided an Ansible dynamic inventory script that build the inventory from the DRP profiles the machines are in (correct if wrong @zehicle )

shane
2017-10-16 14:20
you got it right; @wdennis

wdennis
2017-10-16 14:21
Ok - all I?m wanting (:stuck_out_tongue_winking_eye:) is a way to trigger the Ansible run from a DRP stage

ctrees
2017-10-16 14:22
Oh... trigger from DRP not from < fill in blank external event >

greg
2017-10-16 14:23
@wdennis - couple of ways. @ctrees and @zehicle alluded to one. Let me get to a keyboard to elaborate.

ctrees
2017-10-16 14:36

ctrees
2017-10-16 14:37
The kubespray content pack ? name ? comes from


ctrees
2017-10-16 14:38
... eventually ?...

ctrees
2017-10-16 14:39
What I was going to do is use the kubespary model but build up an OpenAFS ansible deployment

greg
2017-10-16 14:39
There is a kubespray in the RackN content - login into the RackN portal and you should see it.

greg
2017-10-16 14:40
@wdennis - I can think of a couple of ways to do what you want.

ctrees
2017-10-16 14:40
I see now that Rob was using DRP to generate the dynamic inventory for kubespray... I get that now (thanks)

greg
2017-10-16 14:41
1. external event handler - using websocket interface (or drpcli wrapper for it), wait for an event (like machine update (stage == TRIGGER_ANSIBLE)), then call ansible script that would run on the system and change stage to ansible finished.

greg
2017-10-16 14:42
2. Write a plugin that does #1 for you. It would be a ?Publish? plugin that registers an event publisher and than the does the same processing.

greg
2017-10-16 14:43
3. Write a plugin that add a machine action to the node called ?RunAnsible? or something liek that. Use parameter injection to drive the playbook and vars you want. Evening calling out to the inventory script to help create the inventory file. Call the RunAnsible command from the stage/task that you want.

greg
2017-10-16 14:45
4. Write a content bundle that install ansible on the machine in question, gits/gets the playbook in question, and then runs ansible on the node with a custom inventory.

greg
2017-10-16 14:45
So - lots of options at various levels of coding.

greg
2017-10-16 14:47
One of the big differences around DRP and DRv2 is that DRP doesn?t have a built in dependency system. So, sequencing tasks across nodes is not as easy. I?m keeping that in the back of my mind, but currently we don?t have plans to pull in that dependency model.

greg
2017-10-16 14:47
And I mean cross-node dependency system. Inside a node, we do with task lists and stages.

greg
2017-10-16 14:48
node == machine - Greg?s brain is slow.

wdennis
2017-10-16 15:00
@greg Thanks - good to know where the design boundries are

greg
2017-10-16 15:02
The sync points are now up to you. With the fact that the API can operate atomically on parameters, you can create your own sync points. Like choose a profile for your ?cluster? and tweak params as need to create sync points. I?m not sure if we?ll move that into a first order feature.

wdennis
2017-10-16 15:04
Looks like one could make a call to either Jenkins or maybe AWX ( https://www.jeffgeerling.com/blog/2017/ansible-open-sources-ansible-tower-awx ) to run a given playbook on a node - would need the dynamic inventory functionality to define the target(s)

zehicle
2017-10-16 15:07
I've looked at a tower integration using the API. would be possible to do as a plugin to register nodes

johnsutten
2017-10-16 15:10
Morning all, why is it in the Ubuntu installs there are LVM groups and ext2 used?

greg
2017-10-16 15:19
History more than anything else, @johnsutten. The initial ubuntu preseed was written around 9.04 or so. We?ve been minimally changing and it it was used across the board for ubuntus and debians. We could probably update it.

johnsutten
2017-10-16 15:29
@greg This morning installs are moving a whole lot better until about the 3rd phase on 16.04 and it hangs on Running dpkg

wdennis
2017-10-16 15:33
@zehicle That would be cool, but would love to not have a whole ?nother software stack to manitain just to do the Ansible runs? But I do get having product design limits. Just would be cool to have DRP be able to ?do it all? :slightly_smiling_face:

johnsutten
2017-10-16 15:35
Why is there an easter egg in my UI?

johnsutten
2017-10-16 15:59
Whenever I attempt to provision a new machine starting with the preferences I still get this error "dr-provision2017/10/16 15:57:39.156842 sending block 0: code=0, error: TFTP Aborted"

zehicle
2017-10-16 16:13
@johnsutten we needed to fill that space w something

johnsutten
2017-10-16 16:39
@zehicle @greg I am working on my documentation for installations and upgrades. Over the weekend here I documented what I had to do to get the installs completed. The first thing I did was a 14.04 LTS install and was able to make 14.04 (Ubuntu) node installs. I upgraded my Ubuntu 14.04 to 16.04 where the DRP resides and now I am not able to complete any Ubuntu node installations regardless if i use the 'tip' or 'stable' install

johnsutten
2017-10-16 16:43
Also, in your documentation please update it so it reads that certain OS releases are mandatory for DRP. Once i reach 10 Ubuntu nodes I will be buying Canonical support and I am not excited about having this experience again when i upgrade. Has there been any testing to see how DRP works with deploying nodes and the like after upgrading from 16.04 to 17.10?

wdennis
2017-10-16 16:58
UX want: When the auth token for the endpoint access expires, please display a message that ?Login time exceeded? or the like - I go to do a screen refresh, and just get the ?endless spinner?? Have to do a browser refresh, and then I get the endpoint login prompt.

shane
2017-10-16 17:03
@wdennis - I have an issue open on this

wdennis
2017-10-16 17:03
@shane cool

wdennis
2017-10-16 17:03
Run into that one every day :wink:

johnsutten
2017-10-16 17:05
If i use the 'tip' install, will it eventually be stable ?

zehicle
2017-10-16 17:11
tip is always tracks "stable" active dev for the new release and keeps moving

zehicle
2017-10-16 17:12
when we cut v3.2, stable will move to that

zehicle
2017-10-16 17:12
we try to keep tip working so it's usable

zehicle
2017-10-16 17:12
it's _not_ just the master branch

johnsutten
2017-10-16 17:13
@zehicle Moving to a production environment with DRP then I should stay with the stable install?

johnsutten
2017-10-16 17:59
As I begin a test environment with a stable install on Ubuntu 16.04 I do not have any content packages to add. The two that installed is the backing store and the Digital Rebar Provision Community Content. All of the other 'new content' is only for a 'tip' install.

greg
2017-10-16 17:59
They will work in the stable build.

greg
2017-10-16 17:59
We are working on a UI selector to let you choose stable vs tip content.

johnsutten
2017-10-16 18:01
@greg what are the recommended packages from a base install perspectve

greg
2017-10-16 18:05
For DRP, you need bsdtar, pzip, and 7z. Otherwise, DRP doesn?t care too much.

johnsutten
2017-10-16 18:12
Once in the DRP it seems that os-discovery and os-linux needs to be added to the Content packages to fill in the fields from the dropdowns in the Info and preferences

greg
2017-10-16 18:12
yes - that is true.

johnsutten
2017-10-16 18:13
what am i missing now to enable the default stage of the preferences? i am not able to select discover yet

greg
2017-10-16 18:13
You need to make sure that you have included the iso for sledgehammer.

greg
2017-10-16 18:14
You can check the discovery stage under stages.

johnsutten
2017-10-16 18:14
i have included both

greg
2017-10-16 18:14
Are you on tip or stable?

greg
2017-10-16 18:14
if you are stable, you need to restart the service.

johnsutten
2017-10-16 18:14
stable

johnsutten
2017-10-16 18:14
ok

johnsutten
2017-10-16 18:19
ok - discover came in

johnsutten
2017-10-16 18:37
I have been getting this error as well. Machine d3d3bc6b-9543-4f99-870e-d2f7b926f891 wants Stage ubuntu-16.04-install, which is not available


johnsutten
2017-10-16 18:38
I am not able to change the boot environment from sledgehammer to anything else. Perhaps i am in error, that all three was to be set to ubuntu 16 install ?

johnsutten
2017-10-16 18:39
(boot, stage, profile ?)

zehicle
2017-10-16 19:01
Docs patches welcome!

johnsutten
2017-10-16 19:04
To start my ubuntu deployment i have the following ... boot env local, stage ssh-access and profile ubuntu 16.04

johnsutten
2017-10-16 19:17
never mind - i had to upload the iso again

johnsutten
2017-10-16 20:20
why am i getting these errors in stable? Machine d3d3bc6b-9543-4f99-870e-d2f7b926f891 wants Stage ubuntu-14.04-install, which is not available

johnsutten
2017-10-16 20:20
I made sure i have a global map and the iso exists

johnsutten
2017-10-16 20:21
{ "Available": true, "Description": "Global Ubuntu14 Stage Map", "Name": "UBUNTU14", "Params": { "change-stage/map": { "discover": "ubuntu-14.04-install:Reboot", "ssh-access": "complete-nowait:Success", "ubuntu-14.04-install": "ssh-access:Success" } } }

johnsutten
2017-10-16 20:27
I don't have to restart services every time i add a workflow do i?

greg
2017-10-16 20:29
When adding isos with stable, yes. There is a bug fixed in tip about iso importing.

johnsutten
2017-10-16 20:31
soon as i restart maybe ubuntu 16 will install without issue

johnsutten
2017-10-16 20:33
@greg - at what part do i modify the install to use ext4 and create a swap and not to use LVM groups etc

greg
2017-10-16 20:34
You would need to create a custom bootenv, stage, and preseed template.

greg
2017-10-16 20:34
You would need to recreate the preseed template that does the partitioning how you like, you would need to create a custom bootenv that references that preseed, and a custom stage that uses that bootenv.

johnsutten
2017-10-16 20:35
Ok - I haven't a clue where to start to change that, do we have any documentation?

greg
2017-10-16 20:35
read the docs has parts on bootenvs and templates.

greg
2017-10-16 20:35
Not much on stages yet.

johnsutten
2017-10-16 20:47
@greg Everything else is great I am just looking to change the partition layout

johnsutten
2017-10-16 20:47
ill see what i can come up with and get a nod if it looks ok

johnsutten
2017-10-16 23:14
top

johnsutten
2017-10-16 23:14
wrong window

johnsutten
2017-10-16 23:16
Ok Ubuntu 16.04 won't install at all... 14.04 however appears that it might make it all the way though it has been stuck on the last 'preseed' at 18% for some time and has finished the grub2 package and language installation

zehicle
2017-10-17 01:05
sadly, there is no "just" when talking about partition layouts in preseed.

shane
2017-10-17 03:39
- I've updated the "5min-drp" demo tooling to support unique "cluster name prefixes" - this means you can use the tool to deploy multiple DRP clusters in http://packet.net - in the same PROJECT. https://github.com/digitalrebar/provision/tree/master/examples/5min-drp

shane
2017-10-17 03:40
(note to modify the `http://vars.tf` parameter named `cluster_name` - documented in README)

2017-10-17 14:10
cool!

wdennis
2017-10-17 16:51
repost: https://rackn.slack.com/files/U416T0AAX/F7GU3ADNE/pxe_install_os_options.pdf I had asked: take a look at the attached file; my question is what options does DRP support today, which may it support in future, which are unsupported (possibly b/c underlying install answer file format does not have the capability?) cc: @greg

wdennis
2017-10-17 16:51

zehicle
2017-10-17 17:09
NEW SCREEN in UX - we've added a page (RackN registration required) that allows bulk editing of nodes to set profiles, stages, bootenvs and take plugin actions

vlowther
2017-10-17 17:26
@wdennisThe only one that is tricky is 1, as that involves DRP being able to inventory the system and report what disks are available, which is not something we support right now (but which is on the roadmap), unless you want to do some %pre magic and its equivalent in Debian seed files.

vlowther
2017-10-17 17:30
3 is easily refactorable to something involving a package-list parameter, setting that parameter appropriately (in a profile or on a machine directly), and modifying the relavent bootenvs and kickstart/seed templates to expand that parameter if it is set -- the text.template language we write templates in can handle that task easily.

vlowther
2017-10-17 17:31
How 4 and 5 would be handle is different depending on whether you are using tasks

vlowther
2017-10-17 17:32
in the community content I would handle them as optionally-included tenmplates to be expanded if accompanying parameters are set

vlowther
2017-10-17 17:33
in much the same way the extra templates are expanded in the current centos7 kickstart template: https://github.com/digitalrebar/provision-content/blob/master/templates/ce-centos-7.ks.tmpl

vlowther
2017-10-17 17:34
if you are using rackn licensed content and have access to tasks I would write those as tasks.

vlowther
2017-10-17 17:37
and depending on how complex your partitioning requirements are, 2 can also be handled by making a template for each partition scheme you want and then conditionally including the appropriate one based on a parameter

vlowther
2017-10-17 17:39
as for when we would update the current content to operate as I have outlined, well, Greg and I only have so much bandwidth. :confused:

wdennis
2017-10-17 18:24
@vlowther understood

vlowther
2017-10-17 18:29
Of course, of someone from the community were to step in and undertake this work... :slightly_smiling_face:

wdennis
2017-10-17 19:18
starts reading The Go Programming Language :stuck_out_tongue_winking_eye:

shane
2017-10-17 19:18
@wdennis all of that is completely accomplishable via tasks/templates/profiles work

wdennis
2017-10-17 19:21
@shane Given Time, sounds achievable then...

shane
2017-10-17 19:22
to develop a fully flexible and generically usable set of Seed/KS files - maybe - but to generate custom configs required for your use case - that's not something that will take very long to do

vlowther
2017-10-17 20:16
@wdennis unless you want to start hacking on the core or the CLI, https://godoc.org/text/template is probably the best read.

wdennis
2017-10-17 20:17
@shane @vlowther cool, let?s see what I can come up with...

johnsutten
2017-10-17 21:36
Are the resident openstack people on?

johnsutten
2017-10-17 21:36
expert*

greg
2017-10-17 22:28
: Question - is it reasonable for those wanting to use TIP content to be required to use a TIP DRP?

greg
2017-10-17 22:31
I?m working through some of the upgrade and update and replace issues. The features flags we are adding work for tracking content expectations, but I?ve been wanting to prevent trying to avoid bi-directional feature flags between content and DRP. Content objects include flags which allow content to express their requirements. I?d like to avoid the other way if possible. To have content tasks smart enough to know which version of DRP (through feature flags) if possible.

greg
2017-10-17 22:34
Well - in looking around, I can make it work. Sorry for the noise.

shane
2017-10-17 22:39
@greg In general - I'd expect that a given DRP release version should have a set of tested/validated Content that relates to that version. Any updates/enhancements may have potentially breaking changes - and as such, I'd think that "Stable" DRP and "Stable" Content should be expected to work - but "Stable DRP" and "TIP Content" is a "maybe it works" ... and "maybe it doesn't" ... prospect

chermack
2017-10-17 22:43
added the field

greg
2017-10-17 22:48
@shane - I?m planning for that to be the default position. My table is: Stable DRP + Stable Content = Works! Stable DRP + TIP Content = Could work if content pays attention to DRP Feature Flags TIP DRP + Stable Content = Works (except at major release boundaries, but hopefully then as well) TIP DRP + TIP Content = May work depending upon the features flags again.

shane
2017-10-17 22:49
Sounds like a sane strategy to me ... anyone else have any input on this policy ?

johnsutten
2017-10-17 23:05
Hi all, thought i had it and then i lost it... where is the default ssh username and pass stored for ubuntu 16.04

2017-10-17 23:06
@greg `work if` == `may work`, so _ Stable Content = `work` anyway _ TIP Content = `may work` anyway `may work` == *do not use it*

shane
2017-10-17 23:10
@johnsutten - it's defined in net-seed.tmpl (templates)

shane
2017-10-17 23:10
default username specified if no overridden by parameter is "rocketskates"

johnsutten
2017-10-17 23:46
@shane i went ssh rocketskates@10.0.0.11 and used the same password and it failed - permission denied

shane
2017-10-17 23:47
you have to enable SSH username/pasword support versus SSH Key based access - as some people consider user/pass pairs to be security vulnerability/risk

johnsutten
2017-10-17 23:48
your talking about the ssh root mode right?

wdennis
2017-10-18 02:25
Looks like when we can select multiple disks, in the preseed case, can do some interesting stuff? See https://anonscm.debian.org/cgit/d-i/debian-installer.git/tree/doc/devel/partman-auto-raid-recipe.txt for some examples


wdennis
2017-10-18 03:50
How to test a change to a forked repo (a template in ?provision-content?) before I submit a pull request? The template in question (`ce-root-remote-access.tmpl`) is ?locked? in my DRP

shane
2017-10-18 03:51
@wdennis - you can clone that template - then where it's being called from, change the call to use the newly cloned template ... there will likely be a "chain" of clones you need to create to make the changes ...

shane
2017-10-18 03:52
make changes in the clones - which will be r/w

wdennis
2017-10-18 03:54
@shane OK, thx

greg
2017-10-18 04:05
Okay - the way I test it. I used the `tools/package.sh` from the top directory. This builds a new yaml file.

greg
2017-10-18 04:07
YOu can then use ```drpcli contents update drp-community-content - < drp-community-content.yaml``` to update your local content and test. YOu can always reimport from the RackN Portal if you need to reset.

greg
2017-10-18 04:08
It depends upon how you like to edit the objects. The challenge with the clone method is that when you are done, you will need to translate those clones into the repo objects.

wdennis
2017-10-18 04:09
Just directly `vi` the relevant .tmpl file and then do the `tools/package.sh` thing?

greg
2017-10-18 04:09
yeah

wdennis
2017-10-18 04:09
Cool

greg
2017-10-18 04:10
it will yaml / object validate as part of the build process.

greg
2017-10-18 04:10
Then you update the content. IT should even generate a special version for you.

wdennis
2017-10-18 04:10
Especially b/c the UX ?Clone? function of templates does not seem to be working for me?

greg
2017-10-18 04:11
The clone method is the preferred method for building your own content.

greg
2017-10-18 04:11
though fixing the bugs would be good.

wdennis
2017-10-18 04:12
I select the stock template, click ?Clone?, then change the ID and Contents, but when I click ?Add?, it never returns?

wdennis
2017-10-18 04:13
If I do a browser refresh, it seems that it had worked, the new template is there with the ?unlocked? icon

greg
2017-10-18 04:13
okay - good to know.

greg
2017-10-18 04:14
Opened issue

greg
2017-10-18 04:14
#504

johnsutten
2017-10-18 14:09
Morning all, I deployed an Ubuntu 16 environment with the ssh access. I haven?t modified anything. I am not able to login with rocketskates. Can someone take a look at my environment? https://drclass.010101.info:8092

greg
2017-10-18 14:12
So - you need to create a clone of the `ce-root-access` profile and put YOUR public key in the map of `access-keys`. You currently have my Mac book public key as an example.

greg
2017-10-18 14:13
steps to move forward.

greg
2017-10-18 14:13
1. clone `ce-root-access`

greg
2017-10-18 14:13
2. add your public ssh keys to that map of `access-keys`

greg
2017-10-18 14:13
3. save that profile.

greg
2017-10-18 14:14
4. add cloned profile to machine

greg
2017-10-18 14:14
5 remove `ce-root-access` profile from machine.

johnsutten
2017-10-18 14:14
Thanks!

greg
2017-10-18 14:15
6. set stage of machine back to ubuntu-16.04-install

greg
2017-10-18 14:15
7. FIX WORKFLOW maps

greg
2017-10-18 14:15
8. reboot machine.

greg
2017-10-18 14:16
The workflow needs to be fix before you reboot the machine.

greg
2017-10-18 14:16
You don?t have one.

greg
2017-10-18 14:16
oh yeah- you do.

greg
2017-10-18 14:16
sorry - let me look at it

johnsutten
2017-10-18 14:17
As i get closer to the end of this I would like to have Digital Rebar as one of the first topics in the LMS / MOODLE environment for people to learn.

greg
2017-10-18 14:17
nvm - that looks good . assuming you put UBUNTU16 on the machine and it appears you did.

johnsutten
2017-10-18 14:17
Yes I did works great

greg
2017-10-18 14:18
The default access for ubuntu should be: rocketskates/RocketSkates

greg
2017-10-18 14:18
but only from a tty (not ssh by default).

johnsutten
2017-10-18 14:18
Still need to make those changes then

johnsutten
2017-10-18 14:18
Needs to be ssh

wdennis
2017-10-18 15:00
@greg Where are the template files in the DRP isolated-mode tree?

wdennis
2017-10-18 15:01
I?m finding `*tmpl.json` files in the `drp-data/digitalrebar/templates` path, but not the ?official? DR ones

wdennis
2017-10-18 15:02
The stuff in there looks to be the ?unlocked? ones, including my clones of the official DR provided ones

shane
2017-10-18 15:03
content will be rolled up in the drp-data/saas-content/ directory

shane
2017-10-18 15:05
if you're looking to snag an existing template to modify via CLI - you can do something like: ```drpcli templates list | jq '.[].ID' # get list of ID names drpcli templates show net-seed.tmpl```

wdennis
2017-10-18 15:07
@shane I want to make a direct edit to the ?root-remote-access.tmpl? then run `tools/package.sh` as @greg had indicated

wdennis
2017-10-18 15:21
OK, have edited `./drp-data/saas-content/os-discovery-[...].yaml` with the fix I?m proposing

wdennis
2017-10-18 15:22
Now I find that there?s no `./tools/package.sh`

shane
2017-10-18 15:22
that's in the github repo - clone it locally

wdennis
2017-10-18 15:22
OK, did that

wdennis
2017-10-18 15:23
But now I?m getting an error `cp: cannot stat 'assets/startup': No such file or directory`

shane
2017-10-18 15:24
there are a lot of hardcoded dependencies in that script

shane
2017-10-18 15:24
it assumes you're using it a git checkout, and compiled

shane
2017-10-18 15:24
here's @greg to clear up how to use it now:

wdennis
2017-10-18 15:25
Looks like it?s choking on: ```+ cp -a assets/startup /tmp/rs-bundle-HVe8zmX6/assets cp: cannot stat 'assets/startup': No such file or directory```

greg
2017-10-18 15:25
yeah - sorry - my commentary was more for if you wanted to make a PR against the tree. You would need to run all the package commands from the content tree.

wdennis
2017-10-18 15:26
I?m trying to test a proposed template change before I submit

wdennis
2017-10-18 15:26
Don?t really need to build DRP, etc

greg
2017-10-18 15:27
Correct. You would just need to be able to package the content.

greg
2017-10-18 15:27
Give me two seconds - since I started us down this path.

greg
2017-10-18 15:32
Here are the steps: 1. install bsdtar (this could be just tar in the future) (the package.sh will force you to anyway). 2. git clone https://github.com/digitalrebar/provision-content 2.5 cd provision-content 3. edit templates as you like 4. tools/package.sh 5. upload built yaml file into DRP.

wdennis
2017-10-18 15:36
Cool, OK

greg
2017-10-18 15:36
I think you were building from the DRP directory and not a clone of the content directory.

wdennis
2017-10-18 15:37
No, I have a running isolated-install of DRP and just want to test that the proposed template change works before I submit a pull request for the change

wdennis
2017-10-18 15:38
Want to integrate the changed template into that

greg
2017-10-18 15:38
okay

greg
2017-10-18 15:39
I should still document the content development steps.

wdennis
2017-10-18 15:41
So I guess I?ll have to test with the ce-* templates (right now using the non-ce ones)

wdennis
2017-10-18 15:41
Shouldn?t be a problem I guess

greg
2017-10-18 15:43
Okay - for those. You can be hacky.

greg
2017-10-18 15:43
stop drp, edit the drp-data/saas-content/<file of choice>, start drp

greg
2017-10-18 15:44
actually, @shane is right. Clone is probably better and send an Issue with the object dump of the object to update from the cli.

shane
2017-10-18 15:45
only prob. with that is the Contents field is "escaped", making it hard to edit - need to unescape (explode out), then "repack" the content some how ... easily ...

greg
2017-10-18 15:46
oh that is possible.

shane
2017-10-18 15:46
sure it's possible - "easily" was thrown in there ... sed/awk'ing that wouldn't be very fun

greg
2017-10-18 15:46
no - drpcli unbundle

shane
2017-10-18 15:46
:slightly_smiling_face: nice

greg
2017-10-18 15:47
create a directory.

greg
2017-10-18 15:47
cd into directory

greg
2017-10-18 15:47
drpcli contents unbundle <saas content file> --format=yaml

greg
2017-10-18 15:48
yaml for sanity

wdennis
2017-10-18 15:48
Actually, for what I?m trying to test, ce-* templates should work?

shane
2017-10-18 15:49
nice - unbundle/bundle - like it

greg
2017-10-18 15:50
You can do all sorts of `bad` things with those commands and the API calls.

wdennis
2017-10-18 15:50
So I cloned the ?provision-content? repo, made the change to the relevant template, then in top level ran `./tools/package.sh`

wdennis
2017-10-18 15:51
Resulting in a new `drp-community-content.[yaml|sha256]` files

greg
2017-10-18 15:52
yep

greg
2017-10-18 15:52
YEAH!

wdennis
2017-10-18 15:52
Now can use those in existing DRP installation?

greg
2017-10-18 15:52
```drpcli contents update drp-community-content - < drp-community-content.yaml```

wdennis
2017-10-18 15:53
Need the `.sha256` in there as well?

greg
2017-10-18 15:53
```cat ._Version.meta``` should show you the version it decided to give you.

greg
2017-10-18 15:53
The sha is for us on download/upload to and from saas.

greg
2017-10-18 15:54
The UX and the installer us it to make sure nothing was messed up in transit.

wdennis
2017-10-18 15:55
no `._Version.meta` on the filesystem - or is that in a file?

wdennis
2017-10-18 15:55
I see this in the .yaml file: ```meta: Description: Digital Rebar Provision Community Content Name: drp-community-content Source: https://github.com/digitalrebar/provision-content Version: v1.0.0-tip-9-3cfc1b162a77010c6930ef7e65a5a746ad85a84```

greg
2017-10-18 15:56
yeah - for that content you get that.

greg
2017-10-18 15:56
So , you haven?t made a local commit yet.

greg
2017-10-18 15:56
If you have made a local commit for your change, the version should change to indicate it.

wdennis
2017-10-18 15:56
Right - not yet

wdennis
2017-10-18 15:56
Let me do that now

wdennis
2017-10-18 15:57
Commit then `package.sh`?

greg
2017-10-18 15:57
yes

wdennis
2017-10-18 16:01
OK, did that, now have: ```[dradmin@dr-admin provision-content]$ head -n5 drp-community-content.yaml meta: Description: Digital Rebar Provision Community Content Name: drp-community-content Source: https://github.com/digitalrebar/provision-content Version: v1.0.0-tip-dradmin-dev-10-5dc611603bba0352c887efed813e62ec8451f32f```

greg
2017-10-18 16:02
that way you can keep track of what you chaned.

greg
2017-10-18 16:02
dradmin user dev 10 commits ahead of tip.

wdennis
2017-10-18 16:03
OK

wdennis
2017-10-18 16:11
UX doesn?t show version of Content Packages any longer?

greg
2017-10-18 16:11
hover over tip

wdennis
2017-10-18 16:13
Oh that?s interesting - My ver is current, but UX syas there?s an ?Upgrade Available? to the (now-older) DR-provided one

greg
2017-10-18 16:13
Yeah - we are still working on that.

wdennis
2017-10-18 16:13
OK

wdennis
2017-10-18 16:14
Right - no UX edit an existing Stage Map?

wdennis
2017-10-18 16:17
Sorry, confused now - want to change my Stage Map (Workflow) to use the ?ce-ubuntu-16.04-install? instead of ?ubuntu-16.04-install? bootenv, which would (I?m guessing) pick up on my updated ce-* template

wdennis
2017-10-18 16:18
Do I have to create my own Stage that calls that bootenv?

shane
2017-10-18 16:20
you could clone that existing stage, edit it as you mention, then apply that stage to a specific Machine to test it

shane
2017-10-18 16:20
instead of using global and applying to all

wdennis
2017-10-18 16:22
Yes, I actually don?t use Global, created my own profile for installing Ubuntu

shane
2017-10-18 16:22
Good man !

wdennis
2017-10-18 16:22
Let?s see if it works now (really running out of time, but? so close!)

wdennis
2017-10-18 16:23
Hmmm, lots of UX button bugs (at least on OS X Safari?)

lae
2017-10-18 17:17
alright so

lae
2017-10-18 17:17
I've contributed some of the changes I've used in my own DRP environment to CE provision-content

greg
2017-10-18 17:51
Nice

greg
2017-10-18 18:45
@lae - I like the changes and adds. I?m thinking through the pull and add. The main issue I?m thinking through is the 7.3 and 7.4 change. The change is good. I?m debating about keeping it or not.

zehicle
2017-10-18 19:43
Keeping 7.3? Is there a way to archive older bootenvs so they don't clutter up the packs? Maybe a "historical" content on CE?

johnsutten
2017-10-18 19:49
@greg- can you take a look at my UBUNTU-remote-access template? i cloned it from the ce-root-remote-access template

johnsutten
2017-10-18 19:50
also made a corresponding param as well

johnsutten
2017-10-18 20:01
@greg when i try to put my ssh key into UBUNTU-ce-root-access it comes up blank

johnsutten
2017-10-18 20:01
after i save it

johnsutten
2017-10-18 20:10
modified it at the command line..

johnsutten
2017-10-18 20:11
now where is that restart command for the service?

johnsutten
2017-10-18 20:15

greg
2017-10-18 20:19
You shouldn?t have to restart the service, just the stage on the machine and the reboot the machine.

lae
2017-10-18 20:21
@zehicle technically, it would still be available in the git history

lae
2017-10-18 20:21
and git tags/releases

greg
2017-10-18 20:26
It is not a history problem; it is a use within existing users. For example, user is on stable and install c7.3. Updates to tip and all the c7.3 installs and workflows and stages that depend upon it are broken because it disappears.

greg
2017-10-18 20:27
You get warnings and such.

lae
2017-10-18 20:27
ah okay

lae
2017-10-18 20:27
Would it make sense to just make the CentOS 7 bootenv similar to the Ubuntu one and not specify subrelease?

greg
2017-10-18 20:27
My feeling now is to keep it. And deprecate it (add as a feature flag and remove in a few releases).

lae
2017-10-18 20:27
i.e. centos-7-install

greg
2017-10-18 20:28
Yeah - that is an interesting equivalent, but on different timescales.

greg
2017-10-18 20:28
May need to think about it.

greg
2017-10-18 20:29
Add a centos-install ,centos-7-install, centos-7.3.1608 install. And ref appropriately.

lae
2017-10-18 20:29
is that an ask?

greg
2017-10-18 20:30
no - sorry - thinking out loud.

greg
2017-10-18 20:30
similar for debian and ubuntu.

greg
2017-10-18 20:30
implicitly asking for thoughts.

lae
2017-10-18 20:33
I'm not sure having a `centos-install` bootenv (I assume you mean it would always point to latest, so when e.g. RHEL/CentOS 8 is out...) would actually be a good idea, since it might end up in people thinking it'll just be centOS 7 and start using it over `centos-7-install`...and then end up breaking things when CentOS 8 is out

lae
2017-10-18 20:33
I prefer to have the major version specified

greg
2017-10-18 20:33
yeah - wondering about levels. That is what centos pushes for its container bases.

greg
2017-10-18 20:36
for ubuntu XX.YY as releases.

greg
2017-10-18 20:36
Debian-X as releases

johnsutten
2017-10-18 21:56
how do i tell my DRP to look for updates.. my endpoint is no longer responding

johnsutten
2017-10-18 21:58
whether i try to connect via IP or FQDN

johnsutten
2017-10-18 22:12
is there any way to try and 'repair' my DRP?

johnsutten
2017-10-18 22:31
i had to reinstall DRP to get it working...

johnsutten
2017-10-18 22:56
Working through this checklist again....

johnsutten
2017-10-18 22:56
1. clone `ce-root-access` [9:13] 2. add your public ssh keys to that map of `access-keys` [9:13] 3. save that profile. [9:14] 4. add cloned profile to machine [9:14] 5 remove `ce-root-access` profile from machine.

johnsutten
2017-10-18 22:59
1 - cloned to UBUNTU-ce-root-access

johnsutten
2017-10-18 23:00
replaced old ssh key from gregs mac

johnsutten
2017-10-18 23:00
saved the clone

johnsutten
2017-10-19 00:07
looks like it worked ! now onto deploying openstack!

zehicle
2017-10-19 02:42
Cool

johnsutten
2017-10-19 12:19
still working through on issue... i have machines exactly the same and at times they never pick up dhcp when provisioning

zehicle
2017-10-19 12:41
Sometimes that means there is another DHCP server on your network.

johnsutten
2017-10-19 12:41
nope it is isolated

johnsutten
2017-10-19 12:44
i give a kudos to who ever optimized the code... my server is idle with DRP on and my load average is 0.0 0.00 and 0.0.5

johnsutten
2017-10-19 13:17
another item i noticed is when i stop the services and start them up after making changes at times i no longer get the messages of the services starting up anymore

wdennis
2017-10-20 00:19
@greg @shane Looks like I have an install loop going on with my latest workflow?

wdennis
2017-10-20 00:20
Any way to debug on the DRP side?

greg
2017-10-20 00:20
Nice! Unless you didn?t intend that

wdennis
2017-10-20 00:20
Looked at the machine console, and it is indeed in the Ubuntu installer doing things

greg
2017-10-20 00:21
You can jobs to see what is progressing.

wdennis
2017-10-20 00:21
I have a stage ?TEST-ubuntu-16.04-install? that goes to ssh-access:Success

wdennis
2017-10-20 00:22
then ssh-access to complete-nowait:Success

greg
2017-10-20 00:22
Did you machines start in TEST-* stage

wdennis
2017-10-20 00:23
yes

wdennis
2017-10-20 00:24

wdennis
2017-10-20 00:29

wdennis
2017-10-20 00:31
@greg are there logs? I don?t see the workflow stages in jobs?

greg
2017-10-20 00:33
You can look inside the the job at the top of the list to see its output

greg
2017-10-20 00:34
It looks like it had changed stage three times

wdennis
2017-10-20 00:35
Not the right machine?

wdennis
2017-10-20 00:36
I don?t see anything in the Jobs screen for this particular workflow

wdennis
2017-10-20 00:37
Trying to remember when I kicked it off? may have been this morning

wdennis
2017-10-20 00:37
Or last night

greg
2017-10-20 00:38
May need to restart the machine. If you restarted drp the tokens will be invalid

greg
2017-10-20 00:38
Or they timeout.

wdennis
2017-10-20 00:39
Which machine - the DRP host, or the target install server?

wdennis
2017-10-20 00:43
^^ @greg

greg
2017-10-20 00:49
The installing host

wdennis
2017-10-20 00:49
OK, guessed thats what you meant, so did so ?

wdennis
2017-10-20 00:53

wdennis
2017-10-20 00:53
Anything look wrong here to you? (Doesn?t to me?)

wdennis
2017-10-20 00:56
Using the ce-* bootenv in hopes that it will test my updated `ce-root-remote-access.tmpl` template

greg
2017-10-20 00:59
Don?t use the ce-* bootenvs with stages. They don?t run a runner

greg
2017-10-20 00:59
So the stages aren?t run

wdennis
2017-10-20 00:59
So I have to use the non-ce?

greg
2017-10-20 01:03
Yes. I thinking of changing this and getting rid of the ce-*. It is getting to confusing to separate the models. If you used the ce-* you are not using stages and task and need to let the bootenvs fall through

wdennis
2017-10-20 01:03
Did not know that?

wdennis
2017-10-20 01:04
Are workflows/stages DR-login only features?

greg
2017-10-20 01:10
Yes login and get access to them

wdennis
2017-10-20 01:12
@greg How can I test the proposed change to a template (cloned from the provision-content repo)

wdennis
2017-10-20 01:13
That?s all ce-* stuff, right?

greg
2017-10-20 01:15
Ues. Set the node to *none* stage. And the ce-*-install bootenvs. Reboot node and watch it install and move to local bootenvs

greg
2017-10-20 01:15
Hat should test it

wdennis
2017-10-20 01:15
OK

greg
2017-10-20 01:16
You can also put a PR out there and I?ll look at it. I need to work on @lae prs as well

wdennis
2017-10-20 01:16
Was going to test before submitting the PR :slightly_smiling_face:

wdennis
2017-10-20 01:18
Wait, there is no `none` stage in the drop list...

greg
2017-10-20 01:19
There will be.

greg
2017-10-20 01:19
Through drpcli set stage to ??

wdennis
2017-10-20 01:20
Can you give me the drpcli command syntax?

greg
2017-10-20 01:21
`drpcli machines stage <uuid> "" --force`

wdennis
2017-10-20 01:22
```[dradmin@dr-admin drp]$ drpcli machines stage 5fcbf69d-287e-4c2c-b085-5858665cd442 "" Error: Can not change stages with pending tasks unless forced```

greg
2017-10-20 01:22
sorry add the the `--force` flag

wdennis
2017-10-20 01:23
n/p, was guessing that a ?force? flag was needed

wdennis
2017-10-20 01:27
Here we go again?

johnsutten
2017-10-20 12:25
Morning all, sifting through the 'stuff' out there about Kubernetes (k8), Docker, mesosphere, cloudstack and openstack... In all the things i am seeing, what would be the leanest first run to show how DRP works in allowing people to create what they need... today on the front page of docker is they are putting k8 with their service...

johnsutten
2017-10-20 12:37
the other question i have is what tool could i use that give a dashboard of available resources and deploy a kvm / vm on top of ubuntu16?

shane
2017-10-20 13:15
@johnsutten you could try the Kibernetes integration we've done - via Kubespray: http://provision.readthedocs.io/en/tip/doc/integrations/ansible.html But in general - we don't do a lot of "Application Stack" integrations - simply because that isn't what the DRP solution is about - it's about building Bare Metal for YOUR environment, and you can put YOUR workloads on the metal

shane
2017-10-20 13:15
the Kubernetes stuff we have is a demonstration for that reason

shane
2017-10-20 13:16
what is your use case for KVM ? there are a LOT of "UI/Dashboard" based KVM controller solutions available out there ...

johnsutten
2017-10-20 13:35
@shane - this needs to show the dashboard can have a users with quotas and then show what is available to 'spin up' a vm / kvm or container

johnsutten
2017-10-20 13:36
whether that is handled all under kubernetes and openstack or docker or virtual box or vagrant

shane
2017-10-20 13:37
well - Digital Rebar is not really a provider for that sort of thing ... but Virtualization and Containerization management solutions are typically separate tools / things

johnsutten
2017-10-20 13:38
leveraging kubernetes is something that is a part of DR - so then what do most of the community / customers use when deploying DR with 'k8' ?

johnsutten
2017-10-20 13:49

shane
2017-10-20 13:49
nope

johnsutten
2017-10-20 13:52
from what i am reading about it I would use DR and deploy my ubuntu nodes and then run this kubernetes tool.

johnsutten
2017-10-20 13:58
@shane one other question, what is the most 'graceful' method to stopping DR on the server?

shane
2017-10-20 14:52
Just kill (not -9) of the dr-provision service is fine

johnsutten
2017-10-20 15:41
@shane thats a very good housekeeping note!

zehicle
2017-10-21 00:33
@zehicle uploaded a file: https://rackn.slack.com/files/U02DHRR2L/F7NKC925C/digital_rebar_runner_workflow.png and commented: sharing some early draft of stages/jobs/tasks graphics

wdennis
2017-10-21 02:12
Love it - moar docu!!! (Please and thanks)

wdennis
2017-10-21 02:17
Again (to clarify) - DRP runner mode is just for RackN-registered logins, right?

shane
2017-10-21 02:18
Correct - "stages" and "tasks" are advanced content

wdennis
2017-10-21 02:20
So, stages, their tasks and jobs need the runner mode to process them, correct?

shane
2017-10-21 02:20
You got it. Runner is in the RackN BootEnvs....

wdennis
2017-10-21 02:22
Ok, got it. Then, CE (i.e. open-source bits) just has BootEnvs without the runner invocation at the end?

shane
2017-10-21 02:23
Correct, there is no task flow, so no need for CE content to have runner

wdennis
2017-10-21 02:26
Now, RackN registration (login) is free, and stages/tasks/runners are all free functionality with registration, right? Or is it in the plans to charge for registration at some point?

wdennis
2017-10-21 02:28
Or is it that just some of the RackN-authored stages/other functionality (like plugins) that will be for-charge?

shane
2017-10-21 02:32
2 levels of RackN content, registered free, and registered pay. Stages/tasks/workflow are enabled in the free reg content ("os-linux" content pack for example ), and should remain free


wdennis
2017-10-21 02:33
Ok, cool, that was my understanding too...

wdennis
2017-10-21 02:34
Just wondering if I was right or not...

wdennis
2017-10-21 02:36
Goes without saying, these three levels (CE, RackN free, RackN paid) should be made very explicit on the website / mktg stuff / docu...

wdennis
2017-10-21 02:37
(Guess I just did say it!)

wdennis
2017-10-21 02:45

wdennis
2017-10-21 02:46
Running a container orchestration system on containers so you can run some containers on your containers

zehicle
2017-10-21 02:55
@wdennis I'm confirming Shane's statement that our expectation is that we are not planning to require payment (registration is free) for templates that use stages/runner. We expect that more advanced features will require payment.

wdennis
2017-10-21 14:29
Thanks, @zehicle - just want to be able to answer questions correctly :)

2017-10-22 02:12
Hey so when I say run `sudo ./dr-provision --static-ip=10.9.8.2 --file-root=/Users/gremlin/drb/drp-data/tftpboot --data-root=drp-data/digitalrebar`

2017-10-22 02:12
what are the docker commands happening?

2017-10-22 02:15
i'm wanting to run digitalrebar on http://rancher.com/rancher-os/

shane
2017-10-22 02:19
Hi @hadees, DRPv3 does not use/require docker. You can run/deploy the dr-provision binary in a container if you want, but not required.

2017-10-22 02:19
I thought it was running docker under the hood

shane
2017-10-22 02:20
If you insure the required supporting packages are installed, the dr-provision binary should run no problem, but we haven't tested it on RancherOS yet. 64 bit Linux is all that is needed for the binary.

shane
2017-10-22 02:21
Pkgs required are 7zip, unzip, and bsdtar.

shane
2017-10-22 02:21
DRv2 used to, version 3 is a complete rewrite

2017-10-22 19:26
@hadees here's the docker file that's included in the project - https://github.com/digitalrebar/provision/blob/master/Dockerfile

2017-10-22 19:27
it's not our primary testing path, so it may need to be updated.

wdennis
2017-10-23 16:39
Happy to see screen-sharing is a thing in Slack now ? may come in useful for those t?shoot sessions?

wdennis
2017-10-23 16:41
Q - is there any standard DR built-in user for Ubuntu installs? Doesn?t look like my key injection worked on the last server deploy?

wdennis
2017-10-23 16:44
Used `ce-ubuntu-16.04-install` if it matters

wdennis
2017-10-23 16:46
Also, is there such a thing in Slack as user aliases (or groups)? I?d love to be able to ping ?@support? and have it alert the relevant RackN folk?

zehicle
2017-10-23 16:47
does the trick

vlowther
2017-10-23 16:47
rocketskates is the fefault user.

vlowther
2017-10-23 16:47
should be the same password as the UX.

wdennis
2017-10-23 16:48
Injected by default, @vlowther?

wdennis
2017-10-23 16:48
@zehicle Wouldn?t ping all the non-DR folk too?

zehicle
2017-10-23 16:48
ah, yes

shane
2017-10-23 16:49
@wdennis see the net-seed.tmpl for default user injection

zehicle
2017-10-23 16:49
I'm not aware of group alias in Slack. would be handy feature

vlowther
2017-10-23 16:49
yep

shane
2017-10-23 16:50
The at here only alerts people signed in at the time - at channel sends alert to everyone in the channel regardless if they're logged in or not

wdennis
2017-10-23 16:51
I just don?t know if you all monitor Slack on an ongoing basis, or I have to alert folks to take a look

wdennis
2017-10-23 16:51
Time is sometimes of the essence :slightly_smiling_face:

zehicle
2017-10-23 16:53
this is the community channel, so nothing urgent - we do have 1x1 support channels that we monitor where a channel ping would only alert the relevant parties

wdennis
2017-10-23 16:57
@zehicle It?s just that I have limited time to work on the DR stuff during the day, and when I get a half-hour or hour to work on it, if I have questions, I?d need a timely answer or I run out of my window?

wdennis
2017-10-23 16:57
Most of the time it?s been no problem, but sometimes, I have to leave off before I can get an answer

wdennis
2017-10-23 16:59
I do understand ?best-effort response? is the SLA :slightly_smiling_face:

wdennis
2017-10-23 17:01
OK, looks like my custom template did not get executed?

wdennis
2017-10-23 17:03
Trying to test my changes to `ce-root-remote-access.tmpl`; I used the `ce-ubuntu-16.04-install` bootenv that I believe would use that when `access-ssh-root-mode` has a value set

wdennis
2017-10-23 17:04
Can anyone verify that the above would utilize my changed template?

greg
2017-10-23 17:04
Yes - ce-ubuntu will pull in the that template.

wdennis
2017-10-23 17:05
OK, the changed line didn?t work for some reason then?

lae
2017-10-23 17:06
@wdennis you need to reupload the template

lae
2017-10-23 17:06
`drpcli templates upload templates/ce-root-remote-access.tmpl as ce-root-remote-access.tmpl`

lae
2017-10-23 17:06
if you made the change after you've imported those templates previously (i.e. when importing a bootenv)

lae
2017-10-23 17:07
(updating a bootenv doesn't reimport templates I think)

wdennis
2017-10-23 17:09
I have set `access-ssh-root-mode` to ?yes?; Then replaced the `echo [...] >> /etc/ssh/sshd_config` with my `sed --in-place [...]` that I?m wanting to test in my version of the `ce-root-remote-access.tmpl`

wdennis
2017-10-23 17:09
@lae I did re-import that, @greg gave me instructions on how to do so that the DRP system uses my custom version

lae
2017-10-23 17:09
ah okay

wdennis
2017-10-23 17:23
Trying to change the stage map for my Ubuntu install profile, is not updating for some reason (UX)

wdennis
2017-10-23 17:31
How to manually edit the profile to delete the stage map? (Or does one just delete the stage map, which updates the profile??)

greg
2017-10-23 17:40
You can delete the parameter from the profile

greg
2017-10-23 17:40
Or set it to {}

wdennis
2017-10-23 17:49
Thanks @greg - don?t know why can?t delete it from UX? I do it (click Edit on profile, click Remove by change-stage/map(object) param, then click Save at bottom of pane) but when i pull it back up, it?s still there?

wdennis
2017-10-23 17:59
So the drpcli command `drpcli profiles set <my-profile-name> param "change-stage/map" to "{}"` seemed to work? But now when I try to use the UX Workflow screen to re-set-up the stage map and hit Save, it does not update the profile, or actually save the stage map?

greg
2017-10-23 18:01
set it to `null`

shane
2017-10-23 18:02
(`null` is a bare value - no quotes around it)

wdennis
2017-10-23 18:03
I did `drpcli profiles set <my-profile-name> param "change-stage/map" to ""` which returned `null` and now the param is gone

greg
2017-10-23 18:04
cool

wdennis
2017-10-23 18:05
So now I still cannot reset up the Workflow from the UX? Just doesn?t seem to save it?

greg
2017-10-23 18:21
hmm - Just did those steps and it seems to work for me. hard refresh the page to maek sure you are logged in.

wdennis
2017-10-23 21:10
@greg - OK. Different browser on other machine, ensured was logged into the endpoint & them DR Beta, went to Workflow, made a new stage map for my profile, saved it, clicked off to another window, clicked back to Workflow and dropped to my Profile, got nuthin.

wdennis
2017-10-23 21:11
Hard refreshed at the Workflow window, now I have a ?Login? button? Is the DR login working for me?

zehicle
2017-10-23 21:12
can you see/edit the workflow profile from the profile?

wdennis
2017-10-23 21:12
Clicked the ?Login? button, now am at an ?Security? page, has my RackN account details, and displays a ?Logout? button?

zehicle
2017-10-23 21:12
ultimately, that view is used editing that variable.

zehicle
2017-10-23 21:14
oh... I think I may know what happened

wdennis
2017-10-23 21:14
@zehicle, you mean edit the Profile, add an undefined param, choosing ?change-stage/map(object)??

zehicle
2017-10-23 21:14
yes

wdennis
2017-10-23 21:14
Yes, I can see/add it

zehicle
2017-10-23 21:15
I think that we updated the UX to rely on a fix to param names that you may not have in your deployed endpoint.

zehicle
2017-10-23 21:16
the / in the param name is making the API call for the UX unhappy. The UX does change behavior for endpoints with that bug.

wdennis
2017-10-23 21:16
?does?? or ?doesn?t?

wdennis
2017-10-23 21:21
Tried adding the param to the Profile, and even tho I click Save, it does not add?

wdennis
2017-10-23 21:24
OK, about ready to give up on this admittedly very pretty UX?

zehicle
2017-10-23 21:24
the UX does NOT detect if the API has the / defect or not

zehicle
2017-10-23 21:25
it would not be hard to add back the old behavior for older API

wdennis
2017-10-23 21:25
How to update the API to match the UX?

zehicle
2017-10-23 21:25
update the DRP endpoint

wdennis
2017-10-23 21:25
Are the API / UX versioned so they stay matched as to functionality?

zehicle
2017-10-23 21:26
the API is. The UX is not versioned but would need to detect API versions. You've found a version bridging bug

zehicle
2017-10-23 21:27
that came in when we switched to using PATCH instead of PUT (which fixed a different set of bugs)

wdennis
2017-10-23 21:27
So, do I have to go to ?tip? to get to working again?

zehicle
2017-10-23 21:28
checking w/ @greg

wdennis
2017-10-23 21:28
I believe my API is latest stable?

wdennis
2017-10-23 21:29
```[dradmin@dr-admin drp]$ ./dr-provision --version dr-provision2017/10/23 17:12:16.261448 Version: v3.1.0-0-b70cf8ee1f61844a6d64070a8b272c2bec512204```

zehicle
2017-10-23 21:32

zehicle
2017-10-23 21:33
patching API is the fast solution... I can also fix the version detection thing pretty fast too

zehicle
2017-10-23 21:34
that issue was in the v3.1 stable code with params that had / in the name.

zehicle
2017-10-23 21:34
really, anything that had / in the name

zehicle
2017-10-23 21:51
I will add version detection into UX to address stable vs tip

wdennis
2017-10-23 23:23
@zehicle OK, how to patch the API?

wdennis
2017-10-23 23:24
My problem is trying to use this in-dev system as a part of my production workflow?

wdennis
2017-10-23 23:28
Sorry, frustrated - Haven?t had correct installs since I went to 3.1

zehicle
2017-10-23 23:29
workflow is the new hotness - we're working on adding feature flags and other things to protect stable builds as we move forward

wdennis
2017-10-23 23:29
Sounds like a good plan?

zehicle
2017-10-23 23:30
one idea to help w/ production vs testing.... you can run two versions side by side if you change the API / IPXE ports. Then use API to turn subnets on or off

zehicle
2017-10-23 23:31
that way you could keep them both running and use the subnet active to direct traffic

zehicle
2017-10-23 23:31
they can't share data, but you'd have a fall back

zehicle
2017-10-23 23:32
in the mean time, I'm looking at the UX API version thing

wdennis
2017-10-24 00:10
Interested in patching what I have going now - how to do that?

zehicle
2017-10-24 03:41
@shane and @lae had tested some upgrade patterns. I'm not sure what's required between 3.1.0 and tip

wdennis
2017-10-24 12:16
Testing ?tip? to see if UX functions better?

wdennis
2017-10-24 12:16
Running: ```[dradmin@dr-admin drp-tip]$ ./dr-provision --version dr-provision2017/10/24 07:59:08.371285 Version: origin/master.travis.16-tip-strange-02f22e8dddac467f6e46279aa8d39cc5c89731d6```

wdennis
2017-10-24 12:16
Tried to add param to a new profile I created, got this error:


wdennis
2017-10-24 12:21
And from the stdout of dr-provision: ```[GIN] 2017/10/24 - 03:56:03 | 200 | 13.657µs | 192.168.100.158 | OPTIONS /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:56:03 | 406 | 5.922018ms | 192.168.100.158 | PATCH /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:56:28 | 200 | 6.168µs | 192.168.100.158 | OPTIONS /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:56:28 | 406 | 5.855957ms | 192.168.100.158 | PATCH /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:56:29 | 406 | 259.399µs | 192.168.100.158 | PATCH /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:56:29 | 406 | 268.972µs | 192.168.100.158 | PATCH /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:57:09 | 200 | 8.079µs | 192.168.100.158 | OPTIONS /api/v3/profiles/necla-default-ubuntu [GIN] 2017/10/24 - 03:57:09 | 406 | 275.049µs | 192.168.100.158 | PATCH /api/v3/profiles/necla-default-ubuntu```

ctrees
2017-10-24 12:42
@wdennis question: This is still your 'quest' for "a way to trigger the Ansible run from a DRP stage" ?

ctrees
2017-10-24 12:44
if so... I've got allocated more time on a 'greenfield' project at work and they greenlighted Ansible and 'what-ever' PXE/KS thing I want...

ctrees
2017-10-24 12:48
thinking that I could follow your path to reproduce issues you're seeing (are you joining the meetup today) ?

wdennis
2017-10-24 12:49
@ctrees I should be able to...

ctrees
2017-10-24 12:50
seems like most the issues are more UI/UX talking to API ?? right ??

wdennis
2017-10-24 12:52
I suppose so... don?t mind using drpcli for the more simple things, but writing then passing it big blobs of JSON to set up stuff isn?t how I want to spend my day...

ctrees
2017-10-24 12:53
Oh... for me to 'sell' this path, got to make the 'pretty work'....

wdennis
2017-10-24 12:53
I was using ?stable? for just that reason, but it seems the UX is always ?tip?.... and now current UX no longer works with v3.1 stable API...

wdennis
2017-10-24 12:54
Me as well - CLI-only tool no go for my team...

ctrees
2017-10-24 12:56
plus I'll need to make the ansible and ce-_putpackagehere_ then explain the guts... BUT the 'pretty' always seals the deal ... aka worth my time debugguing UX for sure...

wdennis
2017-10-24 12:57
So did you know that ce-* stuff can not use the workflow/stages subsystem?

wdennis
2017-10-24 12:57
(Found that out a few days ago)

ctrees
2017-10-24 12:57
I've got to do a timed 'DR' (Disaster Recover)... so went I talk DR they sort of 'oh cool'...

wdennis
2017-10-24 12:58
Only RackN login unlocks that functionality

wdennis
2017-10-24 12:59
Which is fine as long as it remains free to use...

ctrees
2017-10-24 12:59
Yea, I know and I sort of like that... basically because they know the support boundary...

ctrees
2017-10-24 13:02
I'm hoping that also gives a version control of the api which you guys were talking about too...

wdennis
2017-10-24 13:02
At this early stage I wouldn?t be able to convince folks here to buck up for support... have to show them a fully-functional ?wow? demo before that ever could happen...

wdennis
2017-10-24 13:03
The API *is* versioned; the UX is not

ctrees
2017-10-24 13:10
no, but I think that's possible and what ?? greg ?? is thinking... and I've got the same sort of demo path ahead... the dev side will be making sure they can test everything in the open, but the ops side will need to lock-down customer specific things... which they'll want custom support... so basically I'll need to write 'ce-' packages for dev... so they approve the stack...

shane
2017-10-24 13:24
The Endpoint/UX side issue with version and feature skew is being addressed via the use of "feature tags". The endpoint will have a set of "things it supports and features is can service", and the UX will behave accordingly.

shane
2017-10-24 13:25
Note that this is ... even newer ... than the UX itself. As we've mentioned - the UX itself is Beta release, and believe me - we are working as quickly as we can to stabilize it and sort out all of the details between the Endpoint and UX.

shane
2017-10-24 13:26
We greatly appreciate your frustration in dealing with the fine grained details and issues with this ... and it is very helpful to have your input and feedback on issues.

shane
2017-10-24 13:27
@wdennis - I understand that modifying/managing a lot of JSON isn't very "sexy" - but it is reliable. :slightly_smiling_face: One pattern that I use a lot ... and I really like ... the ability to completely rebuild a DRP Endpoint from scratch in almost zero time at all ...

shane
2017-10-24 13:28
is by dumping the JSON content to disk - and reloading a new DRP endpoint from scratch. This way - you can spin up an Endpoint extremely quickly ... which is very good practice for a modern CI/CD like pipeline - or a "Dev/Test/Stage/Prod" pipeline, if you will ...

shane
2017-10-24 14:00
if that pattern becomes common practice for you - you can very quickly accelerate making changes by modification to the JSON (or YAML if you prefer) blobs ...

shane
2017-10-24 14:00
totally understand that the UX is the sales piece for a lot of the other team members and mgmt

shane
2017-10-24 14:01
...and it's certainly a LOT easier to digest "things" in a visual format from text blobs

snesbitt
2017-10-24 15:53
has joined #json

wdennis
2017-10-24 16:37
@shane Understand that stuff is undergoing active dev, but frustrating when a previously-working system breaks, when I?m running ?stable?

wdennis
2017-10-24 16:38
As well, have had some regressions to things (mainly templates) that were working in v3.0 that are not in v3.1

shane
2017-10-24 16:39
@wdennis totally understand the frustration on the UX side - and we're aware of that and addressing the issues - we amended the meetup agenda for today - to include that very issue

shane
2017-10-24 16:40
@snesbitt - welcome

wdennis
2017-10-24 16:41
So putting the UX issues aside - can someone assist me in getting my Profiles / Workflows issues sorted so that my installs are back to expected functionality?

shane
2017-10-24 16:42
sure - I'm happy to help with that - I don't have historical context/experience w/ the 3.0 stuff, but we should be able to work through it - can we focus on that after meetup?

wdennis
2017-10-24 16:42
Sure, and thanks :slightly_smiling_face:

wdennis
2017-10-24 16:44
Remind me when meetup kickoff is again?

shane
2017-10-24 16:44
11am PST

shane
2017-10-24 16:44
1 hr 15 mins

wdennis
2017-10-24 16:44
OK, thought so, thanks

shane
2017-10-24 16:46
quick reminder ... our 3rd online Digital Rebar meetup starts in a short bit (11 am PST) - we hope you can join us - details: https://www.meetup.com/digitalrebar/

snesbitt
2017-10-24 18:43
HI All. I'm struggling to provision a kvm guest with a centos 7 system without any luck. Could someone walk me through the necessary steps?

shane
2017-10-24 18:44
hey @snesbitt happy to help out - right now we're running the community meetup

shane
2017-10-24 18:44
can we touch base in 20 or 30 mins ?

snesbitt
2017-10-24 18:44
No problem. Catch you then.

shane
2017-10-24 18:44
thx

shane
2017-10-24 19:03
@snesbitt - at the moment - we do not have a "plugin" that supports "machine actions" with KVM - that means "rebooting" a KVM guest instance isn't possible directly in Digital Rebar Provision

shane
2017-10-24 19:03
so in a KVM environment, you need to do 2 things:

shane
2017-10-24 19:03
1. manually walk your VM through reboot cycles

shane
2017-10-24 19:04
2. make sure you advance the Machines "profile" through the "discovery" and "OS install" stages you need

snesbitt
2017-10-24 19:06
Couple of questions here. First, do I need a global (or other workflow) and what should it look like? What is the process for advancing the machines profile? And can it be done using the cli or just the gui?

shane
2017-10-24 19:07
you can change/advance the profiles either via the UX (GUI) or the CLI (or directly via API if you choose)

shane
2017-10-24 19:08
that's up to you how you want to do it - easiest way is probably just through the UX - via the "Machines" left panel menu item, then "Edit" machine

shane
2017-10-24 19:08
you can also do it via the "Bulk Actions" page

snesbitt
2017-10-24 19:08
I'm guessing that I need to get the KVM into a stage of Boot LocalEnv and the apply the profile?

shane
2017-10-24 19:09
By default - when a Machine (KVM guest) boots, and it DHCPs against the DRP endpoint - you need a "Subnet" definition to enable that DHCP interaction - unless you're using an external DHCP server, and you have the DHCP server in DRP disabled

shane
2017-10-24 19:10
any "Unknown" Machine action will be set based on the "Info & Preferences" UX page - for "Unknonw BootEnv"

snesbitt
2017-10-24 19:10
I am successfully getting it to boot from pxe and into sledgehammer (I think it's sledgehammer - a centos system of some sort).

snesbitt
2017-10-24 19:11
Final question is there a log than I can access to watch the process?

shane
2017-10-24 19:11
in this case - you'd want to set it to either "ce-discovery" (if you are using only the drp-community-content), or you'd set it to "discovery" if you are using the RackN advance content (free but requires registration and login of the RackN account)

shane
2017-10-24 19:11
in the UX - you can use the "bullhhorn" icon in the upper left, next to the "Endpoint" - that's the "Announce" icon that opens a websocket event stream window

shane
2017-10-24 19:11
you'll see the calls against the DRP endpoint

shane
2017-10-24 19:12
once you've done "discovery" via the Unknown BootEnv - the Default BootEnv becomes the next step in the process

shane
2017-10-24 19:12
in this case - it sounds like you have it set to "Sledgehammer"

shane
2017-10-24 19:12
(or "ce-sledgehammer")

snesbitt
2017-10-24 19:13
Yes. And after that I need to apply a profile as you indicated above?

shane
2017-10-24 19:13
this is what you'd change - switch the profiile of the Machine to either "ce-centos..." or "centos..." BootEnv - then reboot the VM

snesbitt
2017-10-24 19:13
Ok, I'll give it a try. Thx!!

shane
2017-10-24 19:13
exactly !

shane
2017-10-24 19:14
the advanced RackN content (free - requires registration) ... has "Stages" and "Workflow" that can advance Machines through this process ...

snesbitt
2017-10-24 19:14
I am signed up so I have this.

shane
2017-10-24 19:14
but as I mentioned - we don't at the moment have a Plugin to support "Machine Actions" on the KVM side - we'll have to add that plugin type (it's fairly highly requested, so it'll get there)

shane
2017-10-24 19:14
right ... but we don't have a plugin for you right now :slightly_smiling_face:

snesbitt
2017-10-24 19:14
Lots of promise here so best of luck!

shane
2017-10-24 19:15
you can do some massaging of the actual tasks/bootenvs flow - so if you __know__ how you want your Machines to advance, you can inject those actions inside of the Tasks themselves

snesbitt
2017-10-24 19:15
One step at a time!

shane
2017-10-24 19:16
but our design by default is to handle that externally with allowing DRP Endpoint to push a Machine through changes via IPMI or "IPMI-like" (i prefer "Machine Actions") changes

snesbitt
2017-10-24 19:17
Oh, one last question. After I boot from a localenv and then reboot kvm, how does kvm reattach to the end point. I've kinda tried this and get a kvm boot failure with no bootable disk

shane
2017-10-24 19:29
you have to make sure any disks you have set for your VM Guest are set to "persist" across restarts of the VM and KVM host

shane
2017-10-24 19:29
that's a standard "KVM thingy"

wdennis
2017-10-24 21:03
any idea of what this DRP startup error means?


shane
2017-10-24 21:04
that's awfully blurry and hard to read - but it looks like it was looking for json or yaml and didn't find a validly formed file/contents

shane
2017-10-24 21:05
ah - blew up the pix - invalid character "m"

wdennis
2017-10-24 21:05
In what?

shane
2017-10-24 21:06
(jeesh - you've got some dust on that monitor ... :slightly_smiling_face: )

wdennis
2017-10-24 21:06
It?s a lab, what can I say...

shane
2017-10-24 21:06
some file - unfortunately that error isn't giving us the file name that's causing it to barf

wdennis
2017-10-24 21:06
So that was my prior running DRP ?stable?

wdennis
2017-10-24 21:07
I had stopped it, created another directory, and installed latest ?tip? in that

wdennis
2017-10-24 21:08
Fired that up, hit UX problems (see above) and decided to go back to using my old stable

wdennis
2017-10-24 21:08
Which now does not start...

greg
2017-10-24 21:09
That is one of the files in drp-data/saas-content

greg
2017-10-24 21:09
It appears to be invalid yaml.

wdennis
2017-10-24 21:11
Any cli-based YAML linters out there?

shane
2017-10-24 21:11
yes

wdennis
2017-10-24 21:11
Don?t know what changed, was running with those files before

vlowther
2017-10-24 21:12
More to the point, did you back up your backing store when you moved from stable -> tip?


shane
2017-10-24 21:12
it interprets yaml and will barf if it's not right ... so "sorta linter"

shane
2017-10-24 21:12
I use it in the "5min-drp" example demo stuff I use

vlowther
2017-10-24 21:13
as an upgrade from stable to tip can (and does, in this case) change the internal format of some of the backing store objects.

wdennis
2017-10-24 21:13
Was not an in-place upgrade, installed tip as isolated in another dir

shane
2017-10-24 21:14
right - but did you point the "tip" version at the "stable" version content directory ?

wdennis
2017-10-24 21:14
No

shane
2017-10-24 21:14
good man

vlowther
2017-10-24 21:15
ok -- can you tar up your stable content directory and send it to me in a PM?

shane
2017-10-24 21:17
@wdennis...there's a online yaml linter ... : http://www.yamllint.com/

shane
2017-10-24 21:18
`sudo dnf install yamllint` `sudo apt-get install yamllint` from: https://github.com/adrienverge/yamllint

wdennis
2017-10-24 21:23
@shane thx, using yamllint - all files check out (excepting pedantic ?line too long? and ?wrong indentation? errors)


greg
2017-10-24 21:26
Okay - so that may mean that you have tip content with the stable drp.

greg
2017-10-24 21:27
`ls -al drp-data/saas-content`

greg
2017-10-24 21:27
nvm - I see it

greg
2017-10-24 21:28
That might do it too. tip content.

greg
2017-10-24 21:29
Can you send us those four files, please?

wdennis
2017-10-24 21:30
Sending to @vlowther now

wdennis
2017-10-24 21:35
Or should I just bite the bullet and start running with ?tip? and rebuild my DRP world?

vlowther
2017-10-24 21:37
so the tarball has 6 files

vlowther
2017-10-24 21:37
drp-community-content-v1.0.0-tip-dradmin-dev-10-5dc611603bba0352c887efed813e62ec8451f32f.yaml kubespray-v1.0.0-tip-36-ce4ecc6205224c8cb6144ff35fc82c59e0301183.yaml os-discovery-v1.0.0-tip-30-df57b2959b32fd674ed749dd430b0e823658d4bc.yaml.bak os-discovery-v1.0.0-tip-36-ce4ecc6205224c8cb6144ff35fc82c59e0301183.yaml os-linux-v1.0.0-tip-travis-dev-33-a5f9ea9af72b14eb8c02e60ee5b1eb11485d7b3e.yaml yq

vlowther
2017-10-24 21:37
Where did the yq and .bak file come from?

vlowther
2017-10-24 21:39
Either way, the store implementation assumes that anything that isn't .yaml or .yml is JSON

vlowther
2017-10-24 21:39
so the presence of extra files in the saas-contents directory will cause load failures.

wdennis
2017-10-24 21:40
Well - the .bak is a backup of prior os-discovery and yq is a YAML linter that @shane recommended

wdennis
2017-10-24 21:41
Can delete and retry

vlowther
2017-10-24 21:41
please do.

wdennis
2017-10-24 21:42
Live and learn

wdennis
2017-10-24 21:43
So takeaway is only valid DRP yaml or json files in saas-content - :white_check_mark:

vlowther
2017-10-24 21:43
Yep.

vlowther
2017-10-24 21:44
Although we will add a check for obviously-bad files.

wdennis
2017-10-24 21:44
Silly users

vlowther
2017-10-24 21:44
breaking my Perfect (TM) code!

wdennis
2017-10-24 21:45
Now with BugFree(tm) !!!1!

vlowther
2017-10-24 21:46
Either way, time to head out.

wdennis
2017-10-24 21:46
Thanks

wdennis
2017-10-24 21:49
So @shane ? can we work on the drpcli way of setting up the workflow I want against my profile?

shane
2017-10-24 21:50
sure - give me 10 mins

wdennis
2017-10-24 21:50
Ack

shane
2017-10-24 21:59
ok

wdennis
2017-10-24 22:00
So, I have a custom profile with the param settings i want for the install templates

shane
2017-10-24 22:01
ok - with existing content - it's pretty easy to dump all the things - and save them out for reload

shane
2017-10-24 22:02
you might want to install a new DRP endpoint just for this testing ? to shovel content in to - and see how it looks

shane
2017-10-24 22:02
if you don't provision against it - you can disable dhcp/tftp - and set the API port to an alternate number - and you can then hit it via UX IP:alt_port_number

shane
2017-10-24 22:02
to both watch it being built, and see that it's "right" the way you want it

wdennis
2017-10-24 22:03
Trying to set up a workflow to install U16.04, then move to ssh-access, then done

shane
2017-10-24 22:03
the whole process is "codified" in bash - via the 5min-drp stuff in git: https://github.com/digitalrebar/provision/tree/master/examples/5min-drp

shane
2017-10-24 22:04
basically - the "demo-run.sh" is a driver to the bin/control.sh script - which does the main work

wdennis
2017-10-24 22:04
So I should look at that and see if I can do it?

shane
2017-10-24 22:05
you can - it's designed to spin up a http://packet.net instance, use terraform, and plumb up a DRP endpoint with the RackN registered content ... then provision nodes in http://packet.net

shane
2017-10-24 22:05
that's a bunch of stuff you don't really need (http://packet.net and terraform) (presumably)

shane
2017-10-24 22:06
so what we want to do more than anything else, is just observe the patterns of how I did that there

wdennis
2017-10-24 22:06
OK, fair enough

wdennis
2017-10-24 22:06
Let me take a look and see what I can get done

shane
2017-10-24 22:06
the drp-install stuff is boring - you're probably pretty good at spinning up an Endpoint, by now :slightly_smiling_face:

shane
2017-10-24 22:07
if you look at the bin/control.sh usage statement, you get: ```USAGE: $0 [arguments] WHERE: arguments are as follows: help | usage this help statement install-terraform installs terraform locally install-secrets installs API and PROJECT secrets for Terraform files ssh-keys generates new ssh keys, REMOVES existing keys first set-drp-endpoint <ID> sets the http://drp-machines.tf endpoint information for Terraform get-drp-local installs DRP locally get-drp-cc installs DRP *community* content get-drp-plugins installs DRP Packet Plugins drp-install <ID> install DRP and basic content as identified by <ID> remote-content <ID> do 'get-drp-cc' and 'get-drp-plugins' on remote <ID> drp-setup <ID> perform content and plugins setup on <ID> endpoint get-drp-id get the DRP endpoint server ID get-address <ID> get the IP address of new DRP server identified by <ID> ssh <ID> [COMMANDS] ssh to the IP address of DRP server identified by <ID> scp <ID> [FILES] ssh to the IP address of DRP server identified by <ID> cleanup WARNING WARNING WARNING ```

wdennis
2017-10-24 22:08
So what I really need is to set up the workflow (stage maps)

shane
2017-10-24 22:08
the interesting bits for you are the `drp-install` (contents bits), `get-drp-plugins` (may or may not apply depending on what you need), and `drp-setup`

shane
2017-10-24 22:09
each of those three steps are just case statements in bash - so you can search to them via `drp-install)` (note the closing parenthesis)

shane
2017-10-24 22:09
when searching

shane
2017-10-24 22:09
in each of the steps - I'm being pedantic about deleting any content if it exists first, and then loading it again ...

shane
2017-10-24 22:10
it allows me to re-iterate the process over-and-over against an existing endpoint cleanly, and insuring I wipe content clean and reinstall with new (possibly the same, or possibly updated/modified content)

shane
2017-10-24 22:10
that's just a general pattern

shane
2017-10-24 22:10
if you know you start from scratch every time - you can stip the test/wipe/upload and just "upload" (eg "create") steps

shane
2017-10-24 22:11
does that make sense ?

wdennis
2017-10-24 22:11
Let?s see :slightly_smiling_face:

wdennis
2017-10-24 22:11
Gotta run now - but will pick this up later tonight

shane
2017-10-24 22:11
cool

wdennis
2017-10-24 22:11
Thx for help, we?ll see what I can figure out?

shane
2017-10-24 22:12
hmm - skip `drp-install` altogether

shane
2017-10-24 22:12
I restructured it to put everything in `drp-setup` (install just "installs"; as the name suggests, now)

wdennis
2017-10-24 22:12
OK

ejk
2017-10-25 16:45
has joined #json

zehicle
2017-10-25 18:30
Welcome @ejk!

zehicle
2017-10-25 18:34
coming change for UX (as per call yesterday) will be to start respecting feature flags from the endpoint. First impact will be that the Workflow page requires being on tip (or a very recent build of the endpoint that supports Features flag).

wdennis
2017-10-25 21:58
So @shane looks like the relevant section of 5min-drp `bin/control.sh` for my needs (creating stage map for a profile) are lines 520-539

wdennis
2017-10-25 21:58
correct?

shane
2017-10-25 21:59
In mtg now until 5pm

wdennis
2017-10-25 22:00
OK, catch ya later

shane
2017-10-25 22:07
the `drp-setup)` case stanza should be the majority of what you need

shane
2017-10-26 00:00
@wdennis - back online - you still need some help ?

shane
2017-10-26 00:01
@ejk - welcome to our little #community here ... :slightly_smiling_face:

wdennis
2017-10-26 01:38
@shane you still around?

shane
2017-10-26 01:39
for a couple minutes ... I have a huge hunk of marinated tri-tip on the barbie ...

wdennis
2017-10-26 01:39
Got the profile reestablished with the stage map, but now when I put it against the machine I want to re-roll, it isn?t setting the bootenv like it was before?

wdennis
2017-10-26 01:43
I set it manually thru the UX, so now it looks like: ```[dradmin@dr-admin drp]$ drpcli machines show 5fcbf69d-287e-4c2c-b085-5858665cd442 { "Address": "192.168.1.143", "Available": true, "BootEnv": "ubuntu-16.04-install", "CurrentTask": 0, "Description": "Dell PE 860", "Errors": [], "Name": "testnode01", "Profile": { "Available": false, "Errors": null, "Name": "", "ReadOnly": false, "Validated": false }, "Profiles": [ "necla-default-ubuntu" ], "ReadOnly": false, "Runnable": true, "Tasks": [], "Uuid": "5fcbf69d-287e-4c2c-b085-5858665cd442", "Validated": true }```

2017-10-26 01:43
Time to feed the :bear:!

shane
2017-10-26 01:43
what does your stage map look like ?

wdennis
2017-10-26 01:44
```[dradmin@dr-admin drp]$ drpcli profiles show necla-default-ubuntu { "Available": true, "Description": "NECLA Default Stage-map", "Errors": [], "Name": "necla-default-ubuntu", "Params": { "change-stage/map": { "ssh-access": "complete-nowait:Success", "ubuntu-16.04-install": "ssh-access:Success" } }, "ReadOnly": false, "Validated": true }```

wdennis
2017-10-26 01:47
looks OK? (or not?)

shane
2017-10-26 01:48
I'm not sure about the "ssh-access" piece ... did Greg provide that for you ? I'm used to manipulating from "discover" forward

wdennis
2017-10-26 01:49
It?s one of the RackN-provided stages, yes?

shane
2017-10-26 01:50
yes it is - but you have to have a Machine enter the "right stage" to kick off the stage workflow

wdennis
2017-10-26 01:50
The order is supposed to be: ubuntu-16.04-install --> ssh-access --> done

wdennis
2017-10-26 01:50
So in the JSON, the order matters?

wdennis
2017-10-26 01:51
Here?s the JSON file I wrote: ```[dradmin@dr-admin drp]$ cat necla-default.json { "Available": true, "Description": "NECLA Default Stage-map", "Name": "necla-default-ubuntu", "Params": { "change-stage/map": { "ubuntu-16.04-install": "ssh-access:Success", "ssh-access": "complete-nowait:Success" } } }```

wdennis
2017-10-26 01:53
And then (after deleting the old `necla-default-ubuntu` profile first) I did: `drpcli profiles create - < necla-default.json`

wdennis
2017-10-26 01:54
DRP seemingly reordered it to: ``` "Params": { "change-stage/map": { "ssh-access": "complete-nowait:Success", "ubuntu-16.04-install": "ssh-access:Success" } },```

shane
2017-10-26 01:55
the process is right - I think it's just a matter of the stage map being right to match where your Machine is at currently - and where you want it to get to

shane
2017-10-26 01:55
TBH - I'm not sure about that flow - and we'd need greg to weigh in on that

wdennis
2017-10-26 01:57
I think the machine?s bootenv needs to be set to a stage in the stage map, which it is

wdennis
2017-10-26 01:59
n/m? The *stage* needs to be set, not the bootenv

wdennis
2017-10-26 02:00
What confuses me is that the bootenv and the stage are named the same

wdennis
2017-10-26 02:04
Set the stage appropriately, and now the bootenv is set to the same automatically? which is right

wdennis
2017-10-26 02:04
Now I have: ```[dradmin@dr-admin drp]$ drpcli machines show 5fcbf69d-287e-4c2c-b085-5858665cd442 { "Address": "192.168.1.143", "Available": true, "BootEnv": "ubuntu-16.04-install", "CurrentTask": -1, "Description": "Dell PE 860", "Errors": [], "Name": "testnode01", "Profile": { "Available": false, "Errors": null, "Name": "", "ReadOnly": false, "Validated": false }, "Profiles": [ "necla-default-ubuntu" ], "ReadOnly": false, "Runnable": true, "Stage": "ubuntu-16.04-install", "Tasks": [ "change-stage" ], "Uuid": "5fcbf69d-287e-4c2c-b085-5858665cd442", "Validated": true }```

2017-10-26 02:04
Time to feed the :bear:!

wdennis
2017-10-26 02:07
PXE-booted the node, let?s see what I get?

shane
2017-10-26 02:11
Might WiFi capped out, and dinner is on table - need input on stage map transitions, the Ubuntu install is successfully completed, right?

wdennis
2017-10-26 02:13
enjoy dinner, talk to you tomorrow ? installs on these old Dells take a while, and no remote console :cry:

wdennis
2017-10-26 12:53
Good morning? The node did install, but still not picking up my ssh-access param?s?

wdennis
2017-10-26 12:53
I see these job log outputs from the run: ```Log for Job: 4353e00c-5ece-4035-a58e-a7ee44a37790 Starting Content Execution for: change-stage.sh.tmpl Error: Failed to fetch info info: [GET /info][403] getInfoForbidden DRP does NOT support 'sane-exit-codes' using old codes ...Machine's current stage: ubuntu-16.04-installChecking for data: ubuntu-16.04-install from ssh-access:SuccessAttempting to test Stage to ssh-access and return 0{ "Address": "192.168.1.143", "Available": true, "BootEnv": "ubuntu-16.04-install", "CurrentJob": "4353e00c-5ece-4035-a58e-a7ee44a37790", "CurrentTask": -1, "Description": "Dell PE 860", "Errors": [], "Name": "testnode01", "Profile": { "Available": false, "Errors": null, "Name": "", "ReadOnly": false, "Validated": false }, "Profiles": [ "necla-default-ubuntu" ], "ReadOnly": false, "Runnable": true, "Stage": "ssh-access", "Tasks": [ "ssh-access", "change-stage" ], "Uuid": "5fcbf69d-287e-4c2c-b085-5858665cd442", "Validated": true}\nChanged stage successfully: Returning 0Command change-stage.sh.tmpl succeeded Log for Job: d1e496aa-999f-48ab-9901-b87709656593 Starting Content Execution for: access-keys.sh.tmpl Updating SSHD default valuesRestarting ssh * Restarting OpenBSD Secure Shell server sshd ...done.Finished updating access keys successfullyCommand access-keys.sh.tmpl succeeded Log for Job: e29e9a2f-58d2-4ffa-99a2-d366b949387d Starting Content Execution for: change-stage.sh.tmpl Error: Failed to fetch info info: [GET /info][403] getInfoForbidden DRP does NOT support 'sane-exit-codes' using old codes ...Machine's current stage: ssh-accessChecking for data: ssh-access from complete-nowait:SuccessAttempting to test Stage to complete-nowait and return 0{ "Address": "192.168.1.143", "Available": true, "BootEnv": "local", "CurrentJob": "e29e9a2f-58d2-4ffa-99a2-d366b949387d", "CurrentTask": 0, "Description": "Dell PE 860", "Errors": [], "Name": "testnode01", "Profile": { "Available": false, "Errors": null, "Name": "", "ReadOnly": false, "Validated": false }, "Profiles": [ "necla-default-ubuntu" ], "ReadOnly": false, "Runnable": true, "Stage": "complete-nowait", "Tasks": [], "Uuid": "5fcbf69d-287e-4c2c-b085-5858665cd442", "Validated": true}\nChanged stage successfully: Returning 0Command change-stage.sh.tmpl succeeded ```

2017-10-26 12:53
Time to feed the :bear:!

greg
2017-10-26 13:08
The log entry before that one is the one we need I think

greg
2017-10-26 13:08
Nvm

wdennis
2017-10-26 13:22
Ah, I see the problem - I didn?t include the SSH-related params in the recreated profile!

wdennis
2017-10-26 13:24
so now then, here?s the new JSON to recreate the profile: ```[dradmin@dr-admin drp]$ cat necla-default.json { "Available": true, "Description": "NECLA Default Stage-map for Ubuntu installs", "Name": "necla-default-ubuntu", "Params": { "change-stage/map": { "ubuntu-16.04-install": "ssh-access:Success", "ssh-access": "complete-nowait:Success" }, "access-keys": { "root": "ssh-rsa <redacted> will@Wills-MacBook-Air" }, "access-ssh-root-mode": "yes" } }```

wdennis
2017-10-26 13:25
And then? ```[dradmin@dr-admin drp]$ drpcli profiles destroy necla-default-ubuntu Deleted profile necla-default-ubuntu [dradmin@dr-admin drp]$ drpcli profiles create - < necla-default.json { "Available": true, "Description": "NECLA Default Stage-map for Ubuntu installs", "Errors": [], "Name": "necla-default-ubuntu", "Params": { "access-keys": { "root": "ssh-rsa <redacted> will@Wills-MacBook-Air" }, "access-ssh-root-mode": "yes", "change-stage/map": { "ssh-access": "complete-nowait:Success", "ubuntu-16.04-install": "ssh-access:Success" } }, "ReadOnly": false, "Validated": true }```

wdennis
2017-10-26 13:26
Applied the updated profile to the test host, let?s PXE and see what I get this time?

greg
2017-10-26 14:20
Well we need to make sure that the machine has ssh keys.

greg
2017-10-26 14:21
The stages and tasks ran

greg
2017-10-26 14:22
drpcli machines params uuid ?aggregate.

greg
2017-10-26 14:22
You should see an access keys parameter wth the keys you want installed

wdennis
2017-10-26 14:37
OK, install is complete; did the above and this is what I see: ```[dradmin@dr-admin drp]$ drpcli machines params 5fcbf69d-287e-4c2c-b085-5858665cd442 --aggregate { "access-keys": { "root": "ssh-rsa <redacted> will@Wills-MacBook-Air" }, "access-ssh-root-mode": "yes", "change-stage/map": { "ssh-access": "complete-nowait:Success", "ubuntu-16.04-install": "ssh-access:Success" } }```

wdennis
2017-10-26 14:37
So I do have the correct access key injected

wdennis
2017-10-26 14:38
However? It did not perform the correct action as regards `access-ssh-root-mode`

wdennis
2017-10-26 14:41
The `access-ssh-root-mode` param should utilize the `root-remote-access.tmpl` correct? ```[dradmin@dr-admin drp]$ drpcli templates show root-remote-access.tmpl { "Available": true, "Contents": "#\n# This template populates the root's authorized keys file\n# and makes sure that the sshd config for PermitRootLogin is populated.\n#\n# Runs as part of a shell script for kickstart or net-post-install\n# The template does nothing if proxy-servers is undefined\n#\n# Required Parameters: access_keys\n# Optional Parameters: access_ssh_root_mode\n#\n# Parameter YAML format:\n#\n# access_keys:\n# greg: ssh-rsa key\n# greg2: ssh-rsa key\n# access_ssh_root_mode: \"without-password|yes|no|forced-commands-only\"\n#\n# Defaults:\n# access_keys - empty\n# access_ssh_root_mode - defaults to \"without-password\" if unspecified\n#\n\n{{if .ParamExists \"access_keys\"}}\nmkdir -p /root/.ssh\ncat \u003e/root/.ssh/authorized_keys \u003c\u003cEOFSSHACCESS\n### BEGIN GENERATED CONTENT\n{{ range $key := .Param \"access_keys\" }}\n{{$key}}\n{{ end }}\n### END GENERATED CONTENT\nEOFSSHACCESS\n{{end}}\n\nsed --in-place -re '/^PermitRootLogin/ s/prohibit-password/{{if .ParamExists \"access_ssh_root_mode\"}}{{.Param \"access_ssh_root_mode\"}}{{else}}without-password{{end}}/' /etc/ssh/sshd_config\n\necho \"AcceptEnv http_proxy https_proxy no_proxy\" \u003e\u003e /etc/ssh/sshd_config\n", "Errors": null, "ID": "root-remote-access.tmpl", "ReadOnly": false, "Validated": true }```

greg
2017-10-26 14:41
Okay. So the issue is around the root mode.

greg
2017-10-26 14:42
We are missing a service restart. I think.

wdennis
2017-10-26 14:42
Trying to test the ?sed? in the above, which proposes to replace the `echo "PermitRootLogin yes" >> /etc/ssh/sshd_config` in the current RackN version

greg
2017-10-26 14:54
Something seems amiss; This is my template: ```#!/bin/bash {{if .ParamExists "access-keys"}} echo "Putting ssh access keys for root in place" mkdir -p /root/.ssh cat >>/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN Access Keys GENERATED CONTENT {{range $key := .Param "access-keys"}} {{$key}} {{end}} ### END Access Keys GENERATED CONTENT EOFSSHACCESS chmod 600 /root/.ssh/authorized_keys {{end}} echo "Updating SSHD default values" echo "PermitRootLogin {{if .ParamExists "access-ssh-root-mode"}}{{.Param "access-ssh-root-mode"}}{{else}}without-password{{end}}" >> /etc/ssh/sshd_config echo "AcceptEnv http_proxy https_proxy no_proxy" >> /etc/ssh/sshd_config # Restart sshd but os badness. . /etc/os-release # Ignore error because we may run in a place that doesn't have ssh installed if [[ "$ID" == "ubuntu" || "$ID" == "debian" ]] ; then echo "Restarting ssh" service ssh restart || true else echo "Restarting sshd" service sshd restart || true fi echo "Finished updating access keys successfully" exit 0 ```

greg
2017-10-26 14:55
It has the busted echo injection. Needs the sed replacement.

wdennis
2017-10-26 15:16
So I?m missing the sshd restart bit?

greg
2017-10-26 15:20
yes

wdennis
2017-10-26 15:27
So, I made my own custom community content (cloned the `provision-content` repo, make changes, built per your instructions), and copied this file over to my DRP installation and put it in place?

wdennis
2017-10-26 15:27
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F7QBKKQFM/drp-community-content.yaml and commented: custom community content

greg
2017-10-26 15:29
The community template different than the RackN template.

wdennis
2017-10-26 15:29
That?s how I got my changed `root-remote-access.tmpl` (it seems to be the same as the `ce-root-remote-access.tmpl` in what I built)

greg
2017-10-26 15:29
sigh - I?m trying to fix this.

wdennis
2017-10-26 15:29
?sigh? too?

greg
2017-10-26 15:29
Okay - so here is what is coming hopefully in the next day or two.

wdennis
2017-10-26 15:29
Lay it on me

greg
2017-10-26 15:30
To address this issue.

greg
2017-10-26 15:30
1. Pull in @lae?s patches to community content.

greg
2017-10-26 15:31
2. Rework community content and rackn content to provide stage/task based system that will work without a change-map (similar to exisitng functions).

greg
2017-10-26 15:32
Community content will be in two parts core and contrib. We will put a basic set of things in core and contrib will be where we put things that are less supported from the community.

wdennis
2017-10-26 15:32
^^^ sounds reasonable

greg
2017-10-26 15:32
RackN content will loose the os-linux and os-discovery. They will be come core.

greg
2017-10-26 15:33
Some of the os-linux items will move into contrib or os-other. Depending upon support requirements.

greg
2017-10-26 15:33
There will be task library content pack that will be RackN content that will have additional stages and tasks that do special functions. Like post-post-install runners and other thigns.

greg
2017-10-26 15:34
Content packs associated with plugins will move into the plugins to allow for proper version tracking and consistency (this change just went in).

wdennis
2017-10-26 15:34
So, this all in 3.2?

greg
2017-10-26 15:34
Oh - I stopped numbering my bad.

greg
2017-10-26 15:35
Well - into tip in steps and completed for v3.2

wdennis
2017-10-26 15:35
OK, that?s what I meant

greg
2017-10-26 15:35
feature flags are going to be used to mark feature breaking things.

greg
2017-10-26 15:35
as we go.

wdennis
2017-10-26 15:36
Sounds like a reinstall rather than upgrade from v3.1 to v3.2? a lot of architectural changes coming down the pike?

greg
2017-10-26 15:37
actually, You don?t have to reinstall, but you do have to tweak a couple of things around your read-only content.

greg
2017-10-26 15:37
Let me do that real quick: - updating to tip requires a couple of changes. And should follow this sequence.

greg
2017-10-26 15:38
1. Update drp to tip and restart. Previous plugins will work.

greg
2017-10-26 15:38
2. Remove plugin-providers using UX.

greg
2017-10-26 15:38
3. Remove `packet`, `virtualbox`, and `ipmi` content bundles if installed.

greg
2017-10-26 15:39
4. Re-add latest tip plugins as needed. This will add the previously removed content packs back. None of the object have been changed.

greg
2017-10-26 15:40
5. Optionally update other content packs to latest tip.

greg
2017-10-26 15:41
6. IMPORTANT: Always check `stages`, `bootenvs` to make sure that your used objects are available. A likely change will be a new sledgehammer update. Update this by using the ISO upload feature in the UX or `drpcli bootenvs uploadiso sledgehammer`

greg
2017-10-26 15:42
The goal of these coming changes is to allow for stage-based operations to be the default everywhere for everyone to reduce confusion.

greg
2017-10-26 15:48
@lae - I?m pulling your changes but I?m not going to move tip on that tree until I get so of the other changes stablized.

wdennis
2017-10-26 15:58
So @greg - with all the changes to UX, etc - should I do the upgrade to ?tip?? (Is it stable enough for day-to-day work?)

wdennis
2017-10-26 15:59
I?m not getting usable installs with what I have now?

greg
2017-10-26 16:00
I think so. I?d like to go through a few more installs.

greg
2017-10-26 16:00
I would like that ssh sed line to test though. Can I pull that in

wdennis
2017-10-26 16:01
Was trying to test it first, but that?s not been successful? Let me go ahead and send the pull request I have started for it

wdennis
2017-10-26 16:04
done

greg
2017-10-26 16:04
cool thanks. I?ll lift into the merged pieces.

wdennis
2017-10-26 16:05
It?ll have to be ported over to the ?official? RackN `root-remote-access.tmpl` as well

greg
2017-10-26 16:06
yeah - I?m merging those as we speak.

wdennis
2017-10-27 00:26
OK, think I dug the hole I?m in a bit deeper?

wdennis
2017-10-27 00:29
Updated Content to latest, now I get this on the ubuntu-16.04-install bootenv?

wdennis
2017-10-27 00:29

shane
2017-10-27 00:53
@wdennis do you have the RackN "os-linux" content pack installed ?? those are all artifacts from that content pack

wdennis
2017-10-27 00:54
Have v1.0.0.-tip-39-?

wdennis
2017-10-27 00:55
Running on DRP v3.1.0 (stable) - that may be the issue?

wdennis
2017-10-27 00:55
So much has changed since then?

shane
2017-10-27 00:55
yeah - you can't mix tip content and stable right now

shane
2017-10-27 00:55
:slightly_smiling_face: yes - thanks to feedback and input from our community and users

wdennis
2017-10-27 00:56
I think I need to update to tip?

2017-10-27 00:56
We're working on UX patch that will allow you to pick content versions.... hopefully in review tonight

wdennis
2017-10-27 01:40
Updated DRP to ?tip? (hopefully) - can anyone confirm this is latest tip version? ```[dradmin@dr-admin drp]$ ./dr-provision --version dr-provision2017/10/26 21:22:07.330924 Version: v3.1.0-0-b70cf8ee1f61844a6d64070a8b272c2bec512204```

shane
2017-10-27 01:41
um ... well ... if you wait 10 mins, it'll probably be out of date ... :slightly_smiling_face: `tip` is moving really fast right now

wdennis
2017-10-27 01:46
OK, think that?s still 3.1 stable? Looks like the tip install didn?t work

wdennis
2017-10-27 01:47
Let me create a new directory (?drp-tip?) and install into there?

shane
2017-10-27 01:49
did you install with `... install --drp-version=tip --isolated` ??

wdennis
2017-10-27 01:50
yes

wdennis
2017-10-27 01:50
Doesn?t look like it overwrites existing binaries

shane
2017-10-27 01:50
only if you force it to

shane
2017-10-27 01:50
by default it tries to "be safe"

shane
2017-10-27 01:51
you can `curl` the `install.sh` script by itself without passing it through pipe to `bash` - then run `bash ./install.sh --help` to get help info

wdennis
2017-10-27 02:37
Running `bash ./install.sh --upgrade=true --isolated install --drp-version=tip` in my prior DRP stable isolated top-level directory, wish me luck?

wdennis
2017-10-27 02:40
Well, once that got done installing, tried to start up, and got this: ```[dradmin@dr-admin drp]$ sudo ./dr-provision --static-ip=192.168.1.148 --base-root=/home/dradmin/drp/drp-data --local-content="" --default-content="" --disable-dhcp [sudo] password for dradmin: dr-provision2017/10/26 22:22:45.919405 Version: v3.1.0-tip-173-92a761a0c2f910dc8dda1459345b525962d3c2af dr-provision2017/10/26 22:22:45.919515 Extracting Default Assets dr-provision2017/10/26 22:22:46.908363 Unable to create DataStack: fixBasic: cannot replace bootenvs:local: item in writable store not equal to static version map[Description: OptionalParams:<nil> Errors:<nil> OnlyUnknown:false Name:local Templates:[map[Path:pxelinux.cfg/{{.Machine.HexAddress}} ID:local-pxelinux.tmpl Contents: Name:pxelinux] map[Name:elilo Path:{{.Machine.HexAddress}}.conf ID:local-elilo.tmpl Contents:] map[Name:ipxe Path:{{.Machine.Address}}.ipxe ID:local-ipxe.tmpl Contents:]] Kernel: Initrds:<nil> BootParams: RequiredParams:<nil> Available:true OS:map[Codename: Version: IsoFile: IsoSha256: IsoUrl: Name:local Family:]] map[Name:local Initrds:[] RequiredParams:[] Available:false OptionalParams:[] Validated:false ReadOnly:false Meta:map[] OS:map[Family: Codename: Version: IsoFile: IsoSha256: IsoUrl: Name:local] Templates:[map[Name:pxelinux Path:pxelinux.cfg/{{.Machine.HexAddress}} ID: Contents:DEFAULT local PROMPT 0 TIMEOUT 10 LABEL local localboot 0 ] map[ID: Contents:exit Name:elilo Path:{{.Machine.HexAddress}}.conf] map[Name:ipxe Path:{{.Machine.Address}}.ipxe ID: Contents:#!ipxe exit ]] Kernel: Errors:[] Description:The boot environment you should use to have known machines boot off their local hard drive BootParams: OnlyUnknown:false]```

greg
2017-10-27 04:00
Okay we tried to be cool

greg
2017-10-27 04:01
`cd drp-data/digitalrebar/bootenvs`

greg
2017-10-27 04:01
`sudo rm local.json ignore.json`

greg
2017-10-27 04:03
We moved the required bootenvs into DRP itself as a content layer, but try to do a safety check. We think something has changed so we don?t allow it. I suspect you haven?t changed local and ignore bootenvs.

greg
2017-10-27 04:04
then try to restart.

wdennis
2017-10-27 13:27
Thanks @greg that did the trick

wdennis
2017-10-27 13:36

wdennis
2017-10-27 13:37
However, I still have the exact same errors on the Ubuntu bootenvs as before?

wdennis
2017-10-27 13:37

wdennis
2017-10-27 13:38
tried removing / re-adding the os-linux content via the UX, no change

wdennis
2017-10-27 13:46
, let me ask this question? If I start over with a fresh new DRP tip install, can I move my machines over from my existing DRP install? (Obvs I?d install new tip in another directory, running isolated)

greg
2017-10-27 13:48
You could move the machines directory over.

greg
2017-10-27 13:49
I?ll try and see about the stuff.

wdennis
2017-10-27 13:52
My existing install directory has been existent since v3.0; upgraded from 3.0 --> 3.1 stable --> 3.1 tip, may have a lot of cruft built up by now?

greg
2017-10-27 13:52
Could be - I? trying to make sure that content is working.

wdennis
2017-10-27 13:58
@greg - just copy the *.json files in drp-data/digitalrebar/machines/ over to new?

greg
2017-10-27 13:58
yes

shane
2017-10-27 13:58
stop DRP first

shane
2017-10-27 13:58
then copy

shane
2017-10-27 13:58
then start

wdennis
2017-10-27 14:12
OK, did all that, *still* getting the bootenv errors for Ubuntu (`ubuntu-[14,16].04-install`) as well as CentOS (`centos-7.3.1611-install`)

shane
2017-10-27 14:13
@wdennis - we're investigating it ... plz standby

greg
2017-10-27 14:17
Oversight in the template naming convention. Will be a content update.

greg
2017-10-27 14:25
@wdennis - remove os-linux and os-discovery content and re-add it.

ctrees
2017-10-27 14:42
Quick question... CheCat keep appearing in my user directory... I think it's associated with the Go compile when I followed shane's 5min-dr... is that right ?

shane
2017-10-27 14:43
"CheCat" ??

ctrees
2017-10-27 14:44
yea... I'm on a mac... it looks like it's some sort of go thing but can't figure out what it is... it may be something else...

ctrees
2017-10-27 14:44
the only go thing I did was for the terraform packet

ctrees
2017-10-27 14:47
BUT who knows... it might be an artifact from Che (https://www.eclipse.org/che/)

shane
2017-10-27 14:48
I don't do anything with 'Che' in it ...

shane
2017-10-27 14:48
you also do not need to do the "go get..." business anymore

shane
2017-10-27 14:48
note the README has been update (that stuff has been removed)

shane
2017-10-27 14:49
the `terraform init` piece will correctly pull down the terraform-provider-packet plugin without go compile - the Packet folks got an updated/fixed version in to the Terraform repo finally

ctrees
2017-10-27 14:56
Cool... (no go compile)... and 'in theory' I should be able to morph you 5min to setup greg's 'vbox' demo also ? correct ? putting a dev on laptop then pushing that out to packet is sort of the 'golden stack push demo' I'm going for mailservices

shane
2017-10-27 14:57
virtualbox is fairly different - and for now ... the two (5min + vbox) shall remain separate

shane
2017-10-27 14:57
5min is designed heavily to orchestrate/control http://packet.net via terraform

ctrees
2017-10-27 14:59
ok... so greg was ansible to talk to vbox ? or was that just IPMI

ctrees
2017-10-27 14:59
guess I'll go look at the video... thanks...

shane
2017-10-27 14:59
no - we use a DRP plugin that talks to vbox for "Machine Power Actions" (eg "ipmi-like" capabilities)

shane
2017-10-27 15:00
for vbox you have to do a little set up in advance because of vbox's limitations

ctrees
2017-10-27 15:02
the DRP plugin is v3.2 tip beta ?

shane
2017-10-27 15:03
3.2 doesn't exist yet - we're closing in on cutting that release soon

greg
2017-10-27 15:06
@ctrees - I didn?t automate or create script calls to vbox to make the machines like the system is doing in packet.

greg
2017-10-27 15:06
Also, I started with an ?installed? DRP where 5min builds its own.

ctrees
2017-10-27 15:08
got it, it was a demo of the 'ipmi-like' plugin

greg
2017-10-27 15:08
Yeah - the scripts that drive through drpcli should function the same.

ctrees
2017-10-27 20:19
@shane what did I miss (looks like an env thing)

ctrees
2017-10-27 20:20
-------------------------------------------------------------------------------- ACTION :: terraform apply -target=packet_ssh_key.drp-ssh-key Run next step? [ <Enter> | No | Ctrl-C ] -------------------------------------------------------------------------------- Plugin reinitialization required. Please run "terraform init". Reason: Could not satisfy plugin requirements. Plugins are external binaries that Terraform uses to access and manipulate resources. The configuration provided requires plugins which can't be located, don't satisfy the version constraints, or are otherwise incompatible. 1 error(s) occurred: * provider.packet: no suitable version installed version requirements: "~> 1.0" versions installed: "0.0.0" Terraform automatically discovers provider requirements from your configuration, including providers used in child modules. To see the requirements and constraints from each module, run "terraform providers". Error: error satisfying plugin requirements FAILED -------------------------------------------------------------------------------- ACTION :: terraform apply -target=packet_ssh_key.machines-ssh-key Run next step? [ <Enter> | No | Ctrl-C ]

shane
2017-10-27 20:21
`terraform init` is run via `demo-run.sh`, during the `terraform-install` stage

ctrees
2017-10-27 20:22
You want the full log in a snippet ?

shane
2017-10-27 20:22
check your `~/.terraformrc` file to make sure there aren't a whole bunch of incorrect plugin configs to non-existent plugin location for the `terraform-provider-packet`

ctrees
2017-10-27 20:23
catmini:5min-drp cat$ vi ~/.terraformrc providers { packet = "/Users/cat/CodeOps/5min-drp/bin/terraform-provider-packet" } providers { packet = "/Users/cat/CodeOps/5min-drp/bin/terraform-provider-packet" } providers { packet = "/Users/cat/CodeOps/5min-drp/bin/terraform-provider-packet" }

shane
2017-10-27 20:23
after that - the `terraform-provider-packet` should have been installed in something like `./.terraform/plugins/darwin_amd64/` (substitute darwin... for correct OS/arch)

shane
2017-10-27 20:23
yeah - that's left over cruft from your previous runs

ctrees
2017-10-27 20:24
should I just nuke .terraformrc ?

shane
2017-10-27 20:24
remove those - and if you have nothing else in that file - you can nuke it

shane
2017-10-27 20:24
it's poor idempotency handling on my part with multiple runs of the tool - and the .terraformrc file

ctrees
2017-10-27 20:25
running now...

shane
2017-10-27 20:25
that left over was required because the v0.10.0 plugin didn't have the right http://packet.net API call capabilities in it - so you had to go compile a version - since terraform does not maintain plugins for beta/non-release - so `terraform init` won't get the right version

shane
2017-10-27 20:26
make sure after the `terraform-install` stage completes - that you get the plugin as mentioned above

wdennis
2017-10-27 20:37
@greg Removed it thru UX, but then when went to re-add, got this error: ```Content Upload Failed: ValidationError New layer violates key restrictions: keysCannotBeOverridden: runner.tmpl is already in layer 1 keysCannotBeOverridden: access-keys.sh.tmpl is already in layer 1 keysCannotBeOverridden: change-stage.sh.tmpl is already in layer 1```

wdennis
2017-10-27 20:37
Sorry, that was on `os-discovery`

wdennis
2017-10-27 20:39
And got: ```Content Upload Failed: ValidationError New layer violates key restrictions: keysCannotBeOverridden: ubuntu-16.04-install is already in layer 1 keysCannotBeOverridden: debian-8-install is already in layer 1```

greg
2017-10-27 20:39
not sure.

wdennis
2017-10-27 20:39
when tried to transfer `os-linux`

greg
2017-10-27 20:39
Let me finish content.

wdennis
2017-10-27 20:45
Any ETA? How will I know when available?

greg
2017-10-27 20:47
I?m going to announce. Because I don?t understand what you have.

greg
2017-10-27 20:48
You may need to start over, but I?m not sure.

wdennis
2017-10-27 20:50
Let me know when to nuke & repave?

greg
2017-10-27 20:53
ok

ctrees
2017-10-27 21:06
so does the machine in the demo script stay at sledghammer ? Or the centOS image didn't go into the endpoint

ctrees
2017-10-27 21:06
147.75.64.7

ctrees
2017-10-27 21:06
is the endpoint

shane
2017-10-27 21:07
the work flow should take the Machine through to OS installed - what OS type did you set it to ?

ctrees
2017-10-27 21:07
all defaults

shane
2017-10-27 21:08
ah

ctrees
2017-10-27 21:08
just a sec... I can look up in the script... I just saw that...

shane
2017-10-27 21:08
it probably failed CentOS install

shane
2017-10-27 21:08
the centos folks yanked the 7.3.1611 ISOs off of the repos

shane
2017-10-27 21:08
we're working on a 7.4 content update to fix the issue

ctrees
2017-10-27 21:08
ah... so it makes sense...

ctrees
2017-10-27 21:09
I'm just happy I'm following enough to notice...

shane
2017-10-27 21:09
if you take a look at Profiles --> Global

shane
2017-10-27 21:09
you'll see the stagemap that gets installed - and it's specifying the centos install

shane
2017-10-27 21:09
:slightly_smiling_face:

shane
2017-10-27 21:10
so - you can use this as a "learning" exercise now ... :slightly_smiling_face:

shane
2017-10-27 21:10
you'd need to upload an Ubuntu 16 ISO

shane
2017-10-27 21:10
you can either do that via "drpcli" from your laptop - setting your Endpoint correctly - or you can do it locally on the DPR Endpoint itself

shane
2017-10-27 21:11
the `drpcli` binary will point to 127.0.0.1:8092 by default - so you have to change who it talks to

shane
2017-10-27 21:12
from your laptop you can do: `drpcli -E "https://147.75.64.7:8092" bootenvs list`

shane
2017-10-27 21:12
to list the bootenvs

shane
2017-10-27 21:12
(json blob) - you can amend `| jq ` to get pretty print

shane
2017-10-27 21:12
if you have jq on your local laptop

shane
2017-10-27 21:12
now do: `drpcli -E "https://147.75.64.7:8092" bootenvs show centos-7.3.1611-install`

shane
2017-10-27 21:13
to see *just* the centos bootenv

shane
2017-10-27 21:13
```shane@gala:~/5min-drp$ drpcli -E "https://147.75.64.7:8092" bootenvs show centos-7.3.1611-install | jq '.OS.IsoUrl' "http://mirrors.kernel.org/centos/7.3.1611/isos/x86_64/CentOS-7-x86_64-Minimal-1611.iso"```

shane
2017-10-27 21:14
to use `jq` to grab JUST the IsoUrl location ... now of we go (manually) check out the mirror HTTP location - we see that it's been yanked and a bare `readme` dropped in place, telling us it's been yanked ( we have to "walk" up the HTTP path to find the readme ... thankyou ... centos team ...

shane
2017-10-27 21:14
now - you want to upload the Ubuntu ISO for now if you don't care which OS we hit it with

shane
2017-10-27 21:15
`drpcli -E "https://147.75.64.7:8092" bootenvs uploadiso ubuntu-16.04-install`

shane
2017-10-27 21:17
NOTE: you can do `export RS_ENDPOINT="https://147.75.64.7:8092"` then you don't have to specify the `-E` or equivalent `--endpoint` flag)

ctrees
2017-10-27 21:17
and that's the iso defined by the boot environment which points to http://mirrors.kernel.org/ubuntu-releases/16.04/ubuntu-16.04.3-server-amd64.iso

shane
2017-10-27 21:17
exactly

shane
2017-10-27 21:18
the "uploadiso" is just a helper function to help download and inject the ISO in to the tftpboot structure (it'll call `explode_iso.sh` on the local DRP Endpoint to ... explode out ... the ISO contents appropriately)

shane
2017-10-27 21:19
if you have an appropriately named ISO file - you can copy it in to the `~/drp-data/tftpboot/isos/` directory on Endpoint

shane
2017-10-27 21:19
and either restart, or `kill -HUP <pid>` of the dr-provision service

shane
2017-10-27 21:19
that will trigger an `explode_iso.sh` on the endpoint

shane
2017-10-27 21:20
also - we probably need to ... either re-run the 5min stuff and use `tip` version - or you need to do an inplace upgrade of the current 5min endpoint

ctrees
2017-10-27 21:20
does the drpcli bootenvs uploadiso trigger the explode ?

shane
2017-10-27 21:20
yep ... well, technically ... no ... but yes

shane
2017-10-27 21:20
:slightly_smiling_face:

shane
2017-10-27 21:21
the uploadiso pushes the ISO in place, and then the dr-provision endpoint calls the explode_iso.sh - so indirectly it happens because of the uploadiso run

shane
2017-10-27 21:21
but technically ... "drpcli bootenvs uploadiso ..." doesn't actually call the "explode_iso.sh"

ctrees
2017-10-27 21:22
but it restarts drp ?

ctrees
2017-10-27 21:22
dr-provision service

shane
2017-10-27 21:22
it signals dr-provision appropriately

shane
2017-10-27 21:22
it doesn't restart it

shane
2017-10-27 21:23
if you take a look at your "bootenvs" page on the UX

ctrees
2017-10-27 21:23
got it... and then it shows up in the UX

shane
2017-10-27 21:23
you'll see that the "ubuntu..." bootenv is marked "good" now

shane
2017-10-27 21:23
the blue check mark means good

shane
2017-10-27 21:24
now we have to modify the stagemap to use Ubuntu instead of Centos

shane
2017-10-27 21:24
you can do i via UX - or I can walk you through the CLI

ctrees
2017-10-27 21:25
so is that the 'copy' in the UX (cause the States are locked) thing ?

shane
2017-10-27 21:27
if stages are locked - you need to log in with a RackN account (upper Right)

shane
2017-10-27 21:27
that unlocks stages/workflow

shane
2017-10-27 21:27
however - the stagemap itself is in "profiles" --> "global"

shane
2017-10-27 21:27
in this case I just (lazily) inject the stagemap in to the "global" profile - so every Machine will be subject to it

ctrees
2017-10-27 21:28
I did (I think) login

shane
2017-10-27 21:28
in a "proper" setup, you'd create a new profile ... maybe "global-ubuntu" or something - that has the stagemap configuration

shane
2017-10-27 21:28
then apply that Profile to the machines you want to get Ubuntu installed

ctrees
2017-10-27 21:28
lets do the CLI (and I'll look at the UX) :wink:

shane
2017-10-27 21:29
you could "clone" the "global" Profile as "global-centos", then make the changes to the current "global" to use the Ubuntu BootEnv (ubuntu-16.04-install)

shane
2017-10-27 21:29
sure easy enough - and the right answer :slightly_smiling_face:

ctrees
2017-10-27 21:30
yea... and if your busy... I can figure it out... but appreciate the CLI command guidance if you not... :wink:

shane
2017-10-27 21:30
create a JSON blob (maybe call it "global-ubuntu-stagemap.json" or something) - with the following: ``` { "Available": true, "Description": "packet-map", "Name": "global", "Params": { "change-stage/map": { "discover": "packet-discover:Success", "packet-discover": "${MACHINES_OS}:Reboot", "packet-ssh-keys": "complete-nowait:Success", "${MACHINES_OS}": "packet-ssh-keys:Success" } } }```

shane
2017-10-27 21:31
replace the BASH variables (MACHINE_OS) with "ubuntu-16.04-install"

shane
2017-10-27 21:31
I'm going to assume you run "drpcli" locally on Endpoint, or you set the RS_ENDPOINT variable

shane
2017-10-27 21:32
`drpcli profiles show global` # dump the current global profile

shane
2017-10-27 21:32
you can redirect that to a JSON file, and modify that ... or just save it "as a backup"

shane
2017-10-27 21:32
the easiest solution ... path of least resistance for now

shane
2017-10-27 21:32
is to just "destroy" the profile named "global"

shane
2017-10-27 21:33
then recreate it with the new JSON blob definition

shane
2017-10-27 21:33
`drpcli profiles destroy global` `drpcli profiles create - < global-ubuntu-stagemap.json`

shane
2017-10-27 21:34
now if you do the "show" again - it should be "changed"

shane
2017-10-27 21:36
Note that the "Name" key in the JSON defines the Profile ... ahem ... name

shane
2017-10-27 21:36
so in this case you want to make sure you don't change it from "global"

shane
2017-10-27 21:36
(unless you mean to :slightly_smiling_face: )

shane
2017-10-27 21:38
@ctrees you've got an error in your JSON ... :slightly_smiling_face:

ctrees
2017-10-27 21:38
woops..


ctrees
2017-10-27 21:39
I don't want the var redirect...

ctrees
2017-10-27 21:39
?

shane
2017-10-27 21:39
no ... that was just a construct from the BASH script in 5min-drp

shane
2017-10-27 21:39
remove the dollar and curly braces

ctrees
2017-10-27 21:39
and I though I was being 'smart'

shane
2017-10-27 21:40
see the existing "centos" stagemap

ctrees
2017-10-27 21:40
yea.. saw it when I uploaded...

ctrees
2017-10-27 21:42
now... I have to activate the stage (ubuntu-16.04-install) correct ?

shane
2017-10-27 21:42
nope

shane
2017-10-27 21:42
"global" profile is "activated" ... by default ... globally ...

shane
2017-10-27 21:44
if you take a look at the Machine in the UX

ctrees
2017-10-27 21:44
OH...

shane
2017-10-27 21:44
click on your single Machine there - you'll see the "centos" error message about the BootEnv not being available in the "Stage"

ctrees
2017-10-27 21:44
yup... saw that... so now enable ?

shane
2017-10-27 21:45
but - you see the "gohai" inventory - so sledghammer ran fine, and got the Inventory report

shane
2017-10-27 21:45
then it failed to transition to "centos"

shane
2017-10-27 21:45
you can do a couple things ... delete the "Machine" from DRP itself, and restart it - it'll re-PXE and start over, and go with Ubuntu now

shane
2017-10-27 21:47
or you can "Edit" the Machine - and put it back in to "discover" stage

shane
2017-10-27 21:47
set it to "Runnable", then save it

shane
2017-10-27 21:47
once done, reboot it

shane
2017-10-27 21:48
either via the Packet UX, or via the DRP Machines panel - "reboot" action

shane
2017-10-28 02:39
Too late.... I saw it!

wdennis
2017-10-28 02:39
wrong screen! trying to change tmux panes, wondering why not working?

wdennis
2017-10-28 02:42
But while I have you? does this look right (doing a install just via a stage) ```[GIN] 2017/10/27 - 18:24:26 | 204 | 290.217µs | 192.168.1.143 | POST /api/v3/jobs [GIN] 2017/10/27 - 18:24:26 | 200 | 383.925µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:26 | 200 | 349.233µs | 192.168.1.143 | GET /api/v3/stages/debian-9-install [GIN] 2017/10/27 - 18:24:31 | 200 | 444.117µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:31 | 200 | 4.006366ms | 192.168.1.143 | GET /api/v3/ws [GIN] 2017/10/27 - 18:24:31 | 204 | 378.599µs | 192.168.1.143 | POST /api/v3/jobs [GIN] 2017/10/27 - 18:24:31 | 200 | 382.419µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:31 | 200 | 360.142µs | 192.168.1.143 | GET /api/v3/stages/debian-9-install [GIN] 2017/10/27 - 18:24:36 | 200 | 449.955µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:36 | 200 | 2.224572ms | 192.168.1.143 | GET /api/v3/ws [GIN] 2017/10/27 - 18:24:36 | 204 | 459.116µs | 192.168.1.143 | POST /api/v3/jobs [GIN] 2017/10/27 - 18:24:36 | 200 | 583.233µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:36 | 200 | 348.412µs | 192.168.1.143 | GET /api/v3/stages/debian-9-install [GIN] 2017/10/27 - 18:24:41 | 200 | 445.652µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:41 | 200 | 2.135051ms | 192.168.1.143 | GET /api/v3/ws [GIN] 2017/10/27 - 18:24:41 | 204 | 295.203µs | 192.168.1.143 | POST /api/v3/jobs [GIN] 2017/10/27 - 18:24:41 | 200 | 646.374µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:41 | 200 | 402.289µs | 192.168.1.143 | GET /api/v3/stages/debian-9-install [GIN] 2017/10/27 - 18:24:46 | 200 | 489.889µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:46 | 200 | 2.295319ms | 192.168.1.143 | GET /api/v3/ws [GIN] 2017/10/27 - 18:24:46 | 204 | 297.802µs | 192.168.1.143 | POST /api/v3/jobs [GIN] 2017/10/27 - 18:24:46 | 200 | 402.409µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:46 | 200 | 380.674µs | 192.168.1.143 | GET /api/v3/stages/debian-9-install [GIN] 2017/10/27 - 18:24:51 | 200 | 468.003µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:51 | 200 | 2.199664ms | 192.168.1.143 | GET /api/v3/ws [GIN] 2017/10/27 - 18:24:51 | 204 | 388.935µs | 192.168.1.143 | POST /api/v3/jobs [GIN] 2017/10/27 - 18:24:51 | 200 | 387.637µs | 192.168.1.143 | GET /api/v3/machines/5fcbf69d-287e-4c2c-b085-5858665cd442 [GIN] 2017/10/27 - 18:24:51 | 200 | 382.491µs | 192.168.1.143 | GET /api/v3/stages/debian-9-install```

wdennis
2017-10-28 02:42
looks like some sort of loop?

shane
2017-10-28 02:43
Stupid technology.... Should know what you _meant_ to do...

wdennis
2017-10-28 02:46

2017-10-28 02:46
@wdennis commented on @wdennis?s file https://rackn.slack.com/files/U416T0AAX/F7S83CMSA/Status_: nothing I?ve done for days is working?

shane
2017-10-28 02:56
looks like a loop

shane
2017-10-28 02:56
what's your stagemap look like ?

greg
2017-10-28 03:05
That is the pattern of a loop of the runner waiting for more jobs.

ctrees
2017-10-31 14:44
So... I'm attempting @greg VBox setup and noticed that route is sending ALL broadcast to the vboxnet0 virtual IP (192.168.100.1 in gregs case) ?? correct ?? (so I need to remove my existing broadcast route to my eth0 IP) ?? correct ??

greg
2017-10-31 14:49
Yes.

greg
2017-10-31 14:50
The Mac OSx kernel is a little strange about it. It uses routes over IP out-bound. Strange to my thinking, but okay.

ctrees
2017-10-31 14:52
oh apple always sucks at networking AND they don't follow an established pattern

ctrees
2017-10-31 14:53
but they 'think' they know

ctrees
2017-10-31 14:54
... at least it's some form of nx now...

greg
2017-10-31 14:56
I think they are a little hamstrung by their stack choice. They are using the BSD-based from FreeBSD.

ctrees
2017-10-31 14:56
where's the grummpy old man icon anyway...

greg
2017-10-31 14:56
Yeah- get off my lawn and use a common stack. :slightly_smiling_face:

greg
2017-10-31 14:57
I know I know FreeBSD is the original networking stack. it has some niceties but also some pains.

zehicle
2017-10-31 14:57
you could be using Linux on the Laptop like us crazy people

zehicle
2017-10-31 14:57
speaks for @vlowther too

ctrees
2017-10-31 14:58
oh sure... take the easy way out (networking wise)

vlowther
2017-10-31 14:58
Been my primary OS since the nineties.

vlowther
2017-10-31 14:59
On laptops since the mid 2000s.

vlowther
2017-10-31 15:01
I remember the sk_buff vs. mbuff wars we used to have with the bsdites...

vlowther
2017-10-31 15:02
Heck, these days I don't even have to exhaustively research whether a piece of kit is Linux compatible!

wdennis
2017-10-31 15:33
grants @vlowther his neckbeard badge

vlowther
2017-10-31 15:33
Alas, the men of my family have a hard time growing neckbeards. :slightly_smiling_face:

vlowther
2017-10-31 15:34
I doubt I will be able to grow a proper unix wizard beard until my 70's

lae
2017-10-31 15:53
i'm a millenial so I didn't start using Linux as my primary OS until mid 2000s :sweat_smile:

lae
2017-10-31 16:13
@greg how do I build my own content bundle?

greg
2017-10-31 16:13
magic

greg
2017-10-31 16:13
oh and drpcli

greg
2017-10-31 16:13
or drbundle

lae
2017-10-31 16:14
drbundle?

greg
2017-10-31 16:14
yeah - the build env has a new tool that can be built without using swagger.

greg
2017-10-31 16:14
you need go 1.9 and the like.

greg
2017-10-31 16:14
Or you can get tip and use drpcli contents bundle ?

greg
2017-10-31 16:14
I?m trying to push out a content rework.

lae
2017-10-31 16:15
so I have a similar repo as `digitalrebar/provision-content`, but I'm not sure if I need to set any metadata or anything before using drpcli

greg
2017-10-31 16:15
It will be unspecified if you don?t the cloned tree has some values in ._<key>.meta files

lae
2017-10-31 16:15
(as of now I've just been updating bootenvs/templates by hand for each change)

greg
2017-10-31 16:16
A cleaner way is to build a content bundle with a changed version file and upload it.

greg
2017-10-31 16:16
._Version.meta is the version file.

greg
2017-10-31 16:16
I need to finish a test and there should be new tip content with better examples of all this shortly.

lae
2017-10-31 16:18
oh, the content-reorg branch

greg
2017-10-31 16:18
yes

greg
2017-10-31 16:19
most of os-linux and os-discovery are coming into the community with stages and task.

greg
2017-10-31 16:19
The ce-* is going way. It was just plan silly and confusing.

greg
2017-10-31 16:19
Your changes are integrated in.

lae
2017-10-31 16:20
mhm, noticed

greg
2017-10-31 16:21
tracking stages and bootenvs are added for updates. I?ve tried to make the stages work like the bootenvs of ce-*

greg
2017-10-31 16:21
ssh-access and local-repos are in the stages by default.

greg
2017-10-31 16:21
Using the stages should work like the bootenvs did.

ctrees
2017-10-31 16:21
SO... I went through @shane DRPv3 Training: Installation and DRP Training: Configuration then attempting to get to @greg VBOX demo (aka the network question)... now I'm attempting to use the 'store' to load up the things missing from DRPv3 Training: Install / Config.... that I saw in greg's demo

greg
2017-10-31 16:22
One change will be that the bootenvs won?t do things automagically for you. Stages will need to be used to pull inthe default tasks.

greg
2017-10-31 16:22
you can chain stages if you had added custom tasks before.

greg
2017-10-31 16:23
@ctrees - sounds good, what is your question?

lae
2017-10-31 16:23
`drpcli machines processjobs UUID` is this what runs the stages?

greg
2017-10-31 16:23
It runs the tasks

greg
2017-10-31 16:23
as set by the stages.

ctrees
2017-10-31 16:24
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7TK9CZHC/freshdrponmac.png and commented: No Default Stage

lae
2017-10-31 16:24
ah ok

ctrees
2017-10-31 16:24
Is that correct ?

greg
2017-10-31 16:24
creating jobs as the manifestion of the run.

ctrees
2017-10-31 16:24
No default stages... should I add all that from 'the store' ?

greg
2017-10-31 16:24
@ctrees - you need to set either the discover stage or the sledgehammer default bootenv.

ctrees
2017-10-31 16:24
or is that because of centos moving the iso link ?

greg
2017-10-31 16:25
You also need discovery unknown bootenv to start the whole process.

greg
2017-10-31 16:25
If you look at the bootenvs screen, you should checks or exes for the available bootenvs.

greg
2017-10-31 16:25
You may need to update the sledgehammer ISO.

greg
2017-10-31 16:26
Also you are on stable with tip content I suspect. YMMV.

ctrees
2017-10-31 16:26
I did the sledge update (yesterday)

greg
2017-10-31 16:26
depends upon from which content. i need a better way for this.

greg
2017-10-31 16:26
check the sledgehammer or ce-sledgehammer bootenvs for errors.

greg
2017-10-31 16:26
If you updated cotent, you pick up a new requirement.

ctrees
2017-10-31 16:28
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7TKCPCKG/boot_environments.png and commented: Boot Envs (centos failed due to mirror move)

greg
2017-10-31 16:28
@lae - what do you think about the reorg?

greg
2017-10-31 16:29
@ctrees - you will need the os-discovery and os-linux content packs. And the virtualbox-ipmi plugin. Log into the RackN portal to get access to them.

ctrees
2017-10-31 16:31
yea I was doing that when I decided I should check-in... I'll go load those now...

greg
2017-10-31 16:31
okay - cool

ctrees
2017-10-31 16:39
so... what 'magically' loaded the stages (I had no stages in the GUI... now there is lots)...

lae
2017-10-31 16:39
@greg better

ctrees
2017-10-31 16:39
? part of packages ?

greg
2017-10-31 16:40
The UX goes to the RackN portal to get content bundles and then uses the API to inject them into the DRP instance.

lae
2017-10-31 16:40
I need to look into it more but it looks like it'll work in our favour for less duplication

greg
2017-10-31 16:40
@ctrees - content packages or bundles are just collections of related objects that imported to DRP as read-only content.

lae
2017-10-31 16:40
(i.e. I have a template that configures a local user's SSH and sudo rather than root user to use in our ubuntu/etc bootenvs, which are theirselves separate bootenvs)

greg
2017-10-31 16:41
@lae that is the hope. I realize it will have some impact if you have a bootenv with a lot of custom templates injected, but the hope is to move those to tasks and stages to get chained together for reuse.

greg
2017-10-31 16:41
Makes sense.

greg
2017-10-31 16:47
There is now a contrib content tree for those kinda things as well.

ctrees
2017-10-31 16:48
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7STB8YUV/workflow_cannotaccess.png and commented: What did I do wrong ?

zehicle
2017-10-31 16:48
Refresh

ctrees
2017-10-31 16:49
OH... thanks...

ctrees
2017-10-31 16:49
nope

zehicle
2017-10-31 16:49
AND, it now checks feature flags

zehicle
2017-10-31 16:50
May be a bug that's hiding it for older versions.

zehicle
2017-10-31 16:50
What version?

ctrees
2017-10-31 16:50
tip I though.... but how should I confirm ?

zehicle
2017-10-31 16:51
System config tools will show you

ctrees
2017-10-31 16:52
3.1.... guess I'll update

greg
2017-10-31 16:52
yeah - make sure to run --force on the install script to get the latest tip.

ctrees
2017-10-31 16:52
ok thanks...

greg
2017-10-31 16:52
and --drp-version=tip too

ctrees
2017-10-31 16:53
so is it really better to pull the repo and run that install.sh or ??

ctrees
2017-10-31 16:53
let the install pull based on cmd line

ctrees
2017-10-31 16:54
... guessing it does not matter other than I used the curl .../stable from the slides... ?? right ??

ctrees
2017-10-31 16:54
... I'll keep doing the curl just to test that process...

greg
2017-10-31 16:55
okay - note that tip install.sh still installs stable unless you give it `--drp-version=tip`

ctrees
2017-10-31 16:56
yea... I know I did that one of the times... but obviously I didn't do it my last run... confirmed in my bash history too...

greg
2017-10-31 16:57
okay - @shane has been making the install.sh safer and so `--force` is required to make the override of an already installed environment.

ctrees
2017-10-31 16:59
and shane is gatekeeper of get.rebar.digital/stable ?? correct ?? so best to use that for 'dev setup'

ctrees
2017-10-31 17:00
dev setup demos that is (stuff going to dev to mod and verify deploy configs)

greg
2017-10-31 17:02
That should point to the `stable` release in github. We move the `stable` branch as we release new versioned releases and let tip float head on the tip of `master`

greg
2017-10-31 17:02
So, when we release `v3.2.0`, we?ll reset `stable` to that.

ctrees
2017-10-31 17:03
Oh... so I shouldn't use that

greg
2017-10-31 17:03
you can also do `--drp-version=v3.1.0` for a specific release if you wish.

ctrees
2017-10-31 17:05
catmini:CodeOps cat$ curl -s get.rebar.digital/stable | bash -s -- --isolated --dpr-version=tip --force install

greg
2017-10-31 17:05
checking real quick

shane
2017-10-31 17:06
you can also specify "tip" instead of "stable" on the trailing curl call

shane
2017-10-31 17:06
so: `curl -s get.rebar.digital/tip | ... `

greg
2017-10-31 17:06
cool safer

shane
2017-10-31 17:06
that gets the latest installer - the "stable" installer doesn't have the newer updated safety checks

ctrees
2017-10-31 17:06
ok so catmini:CodeOps cat$ curl -s get.rebar.digital/tip | bash -s -- --isolated --dpr-version=tip --force install

ctrees
2017-10-31 17:07
(probably don't need the --drp-version=tip)

shane
2017-10-31 17:07
you do still need it

ctrees
2017-10-31 17:07
ok... thanks

shane
2017-10-31 17:07
default is stable - the "installer" is separate from "what gets installed"

ctrees
2017-10-31 17:15
humm... seems like I got 3.1

johnsutten
2017-10-31 17:15
Is there a way to install DR and then connect other nodes via command line and not use pxe ?

greg
2017-10-31 17:17
@johnsutten - The short answer is no. But really, yes and no. We are working on that now. There is a content pack that is coming to set up some of that, but we aren?t quit there yet. You can do by creating a machine, setting the machines IP in the machine object, setting the bootenv to local and the stage to none or complete-wait. You can then get the drpcli and run the runner. It then functions like a node that was installed and sitting in a runner.

greg
2017-10-31 17:17
@ctrees - what does `./dr-provision --version` show

ctrees
2017-10-31 17:18
Oh... I think I typoed... did /stable again... redoing

greg
2017-10-31 17:18
oh - stupid eyes. you did `dpr-version`

greg
2017-10-31 17:18
I missed it

ctrees
2017-10-31 17:19
hum....

ctrees
2017-10-31 17:19
catmini:~ cat$ mkdir CodeOps catmini:~ cat$ cd CodeOps/ catmini:CodeOps cat$ curl -s get.rebar.digital/tip | bash -s -- --isolated --dpr-version=tip --force install Overriding DPR_VERSION with tip 'dr-provision' service is not running, beginning install process ... Ensuring required tools are installed Installing Version stable of Digital Rebar Provision dr-provision.zip: OK ./bin/linux/amd64/incrementer: OK ./bin/linux/amd64/dr-provision: OK ./bin/linux/amd64/drpcli: OK ./bin/darwin/amd64/incrementer: OK ./bin/darwin/amd64/dr-provision: OK ./bin/darwin/amd64/drpcli: OK ./bin/windows/amd64/incrementer: OK ./bin/windows/amd64/dr-provision: OK ./bin/windows/amd64/drpcli: OK ./assets/startup/dr-provision.service: OK ./assets/startup/dr-provision.sysv: OK ./assets/startup/dr-provision.unit: OK ./tools/install.sh: OK Installing Version stable of Digital Rebar Provision Community Content drp-community-content.yaml: OK # Run the following commands to start up dr-provision in a local isolated way. # The server will store information and serve files from the drp-data directory. sudo ./dr-provision --static-ip=192.168.1.200 --base-root=/Users/cat/CodeOps/drp-data --local-content="" --default-content="" & # Once dr-provision is started, these commands will install the isos for the community defaults ./drpcli bootenvs uploadiso ubuntu-16.04-install ./drpcli bootenvs uploadiso centos-7-install ./drpcli bootenvs uploadiso sledgehammer catmini:CodeOps cat$ ./dr-provision --version dr-provision2017/10/31 17:18:58.575268 Version: v3.1.0-0-b70cf8ee1f61844a6d64070a8b272c2bec512204 catmini:CodeOps cat$

greg
2017-10-31 17:20
hmmm

greg
2017-10-31 17:20
checking

greg
2017-10-31 17:21
change `--dpr-version=tip` to `--drp-version=tip`

ctrees
2017-10-31 17:21
:stuck_out_tongue_winking_eye:

greg
2017-10-31 17:21
I know testing me.

ctrees
2017-10-31 17:23
no... my brain just puts everything 'right' cause internally I have to be perfect... the world stays in chaos...

ctrees
2017-10-31 17:23
... stupid reality anyway...

lae
2017-10-31 17:24
@greg so we also actually have a modified discovery bootenv too where we do a DNS lookup as fallback (since in our env hostname isn't passed over DHCP), I'm guessing we still need to use that since the start-up.sh script is still part of the discovery bootenv?

greg
2017-10-31 17:25
Yeah - interestingly enough, we?ve had similar thoughts from other customers as well.

greg
2017-10-31 17:26
I?d be fine with adding that into the tree in start-up.sh.

lae
2017-10-31 17:26
hmm

greg
2017-10-31 17:26
Though I?ve also thought about adding it as a stage/task that could be put into a discovery flow.

greg
2017-10-31 17:27
All start-up.sh was intended to be was a create machine step. That gets us to control.sh and a runner.


greg
2017-10-31 17:28
A machine must be created with a name, but that name can be change later.

lae
2017-10-31 17:28
so lines 39 and 47-53

greg
2017-10-31 17:29
yeah - looks good

greg
2017-10-31 17:29
Let me see about including them.

greg
2017-10-31 17:29
and I?ll fix the bootparams I missed some apparaently.

greg
2017-10-31 17:29
:slightly_smiling_face:

ctrees
2017-10-31 17:30
humm.... check me again as I got the same result

ctrees
2017-10-31 17:31
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7S4ZAWP2/install_tip_3_2.txt and commented: Install tip 3.2 but got 3.1 I think

greg
2017-10-31 17:31
that is right.

greg
2017-10-31 17:31
Versions don?t work that way

greg
2017-10-31 17:32
Let me explain.

greg
2017-10-31 17:32
v3.1.0-0 is the stable release version.

ctrees
2017-10-31 17:32
ok... I get it ... v3.2 is not tagged...

greg
2017-10-31 17:32
v.3.1.0-tip-183 is means tip with closest release being v3.1.0 and 183 commits

ctrees
2017-10-31 17:33
so I'm at the right hash ?? correct ??

greg
2017-10-31 17:33
yes

greg
2017-10-31 17:41
@lae - added to content reorg changes.

ctrees
2017-10-31 17:53
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7TT5D28P/install.txt and commented: The iso upload names changed in install script output but not in endpoint (I'm sure it's just version issues)

ctrees
2017-10-31 17:54
ce-<name> vs <name>

greg
2017-10-31 17:54
may need to reload content.

greg
2017-10-31 17:56
stepping a way for a while you crazies - back in a bit

i.grischott
2017-10-31 19:49
has joined #json

lae
2017-10-31 19:54
``` Log for Job: 52637be3-9137-4de7-a3f4-0a2db0183416 Error loading task content: json: cannot unmarshal object into Go value of type []*genmodels.JobAction, continuing ``` is this a known issue? happening on the change-stage task in discovery

greg
2017-10-31 19:55
What version of DRP are you running?

lae
2017-10-31 19:55
v3.1.0-tip-183-24e9aaa6360a28547eb65e292c773acefb50aad6

greg
2017-10-31 19:55
Do you have a change-stage/map variable defined?

lae
2017-10-31 19:55
hm

greg
2017-10-31 19:55
It shouldn?t matter.

lae
2017-10-31 19:56
did you mean Parameter, or?

greg
2017-10-31 19:56
change-stage job is running and gives you that.

greg
2017-10-31 19:56
parameter

greg
2017-10-31 19:56
yes

greg
2017-10-31 19:57
what stage was this running in? what bootenv?

lae
2017-10-31 19:58
discover and discovery

greg
2017-10-31 19:58
basically, booted discover and got that in the change-stage log

lae
2017-10-31 19:58
I removed all my bootenvs/profiles and added community-content earlier

lae
2017-10-31 19:58
yeah

greg
2017-10-31 19:58
okay let me check.

lae
2017-10-31 19:59
although, I see there's an update to community-content, I guess let me try that?

greg
2017-10-31 19:59
wait

lae
2017-10-31 19:59
ok

greg
2017-10-31 19:59
I think this is a bug.

greg
2017-10-31 19:59
in the UX, go to workflows

lae
2017-10-31 19:59
change-stage/map isn't defined, but I'm not sure what it should be defined to

greg
2017-10-31 19:59
It should work with nothing and that is the bug.

greg
2017-10-31 20:00
For now, you can go into the workflow section.

greg
2017-10-31 20:00
Create a global change-stage/map with `discover`->`sledeghammer-wait`:`Success`

greg
2017-10-31 20:00
then go to the node and mark it runnable.

greg
2017-10-31 20:00
bulk actions, select the machine and clik play.

lae
2017-10-31 20:02
bulk actions?

greg
2017-10-31 20:02
You must not be logged into the saas.

lae
2017-10-31 20:03
ah, yeah

greg
2017-10-31 20:03
You can edit the machine, there is a runnable toggle in the machine edit page.

greg
2017-10-31 20:03
set it to runnable and then it should rerun and complete.

lae
2017-10-31 20:03
I marked it as runnable, though that was an vertical ellipse

lae
2017-10-31 20:04
``` root 1977 0.0 0.0 115256 1516 ? Ss 19:42 0:00 /bin/bash /tmp/control.sh root 2246 0.0 0.0 817788 18740 ? Sl 19:42 0:00 \_ /usr/local/bin/drpcli machines processjobs ea914aa8-ae33-4cde-a397-d2d58341e9a5 root 2405 0.0 0.0 115252 1452 ? S 19:58 0:00 \_ /bin/bash ./script root 2406 0.0 0.0 115252 644 ? S 19:58 0:00 \_ /bin/bash ./script root 2408 0.0 0.0 180760 3024 ? S 19:58 0:00 \_ curl -s -f -L -o jq http://192.168.124.11:8091/files/jq ``` noticed this change in the process list of the machine

lae
2017-10-31 20:04
aaaand that IP is incorrect

greg
2017-10-31 20:04
okay - that is a problem

greg
2017-10-31 20:05
What is your --static-ip on DRP set to?

lae
2017-10-31 20:05
change-stage worked though

lae
2017-10-31 20:05
it's not set :sweat_smile:

greg
2017-10-31 20:06
okay - so it should attempt to figure out the best value, but sometimes it can?t. That is why I usually set it to the interface of the DRP machine that I expect default traffic to use.

lae
2017-10-31 20:06
alright, hold on, let me update some things to get that set

greg
2017-10-31 20:07
Can you post the first 20 lines or so of the change-stage.sh.tmpl file

greg
2017-10-31 20:07
I may have already fixed this bug.

lae
2017-10-31 20:08
``` ~$ drpcli templates show change-stage.sh.tmpl | jq -r '.Contents' | head -20 #!/bin/bash # This will contain a token appropriate for the path being # used below. Either a create or update/show token export RS_UUID="{{.Machine.UUID}}" export RS_TOKEN="{{.GenerateToken}}" # Make sure we have a drpcli and jq somewhere ProvURL="{{.ProvisionerURL}}" (mkdir -p /usr/local/bin; cd /usr/local/bin; curl -s -f -L -o jq "$ProvURL/files/jq"; chmod 755 jq) PATH=$PATH:/usr/local/bin drpcli info get | jq .features | grep -q '"sane-exit-codes"' if [[ $? == 0 ]] ; then echo "DRP supports 'sane-exit-codes' using them ..." SUCCESS_CODE=0 FAIL_CODE=1 REBOOT_CODE=64 STOP_CODE=16 else ```

greg
2017-10-31 20:08
hmm - okay - it shouldn?t matter if change-stage/map is set or not.

greg
2017-10-31 20:08
Unless, the requiredparams is still set.

greg
2017-10-31 20:09
okay - that is it. Fixed in the next update to tip.

greg
2017-10-31 20:09
the render failed because params didn?t match. Need to fix that error message though.

greg
2017-10-31 20:19
okay - have a fix for the error message as well.

ctrees
2017-10-31 20:25
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7TQ8181L/contentuploadfailed.png and commented: Is that the same issue (change-state)

ctrees
2017-10-31 20:27
I was attempting to refresh packages and add os-discovery os-linux

lae
2017-10-31 20:31

greg
2017-10-31 20:33
Yes - in my testing the stretch dists didn?t have a security repo yet.

greg
2017-10-31 20:34
maybe it does now.

greg
2017-10-31 20:34
it didn?t 4 days ago when I started testing.

lae
2017-10-31 20:35
weird, I've been using it fine for quite a while now :<

greg
2017-10-31 20:35
I?ll put it back and try it.

greg
2017-10-31 20:44
trying all 4 debian-based with the change now.

lae
2017-10-31 20:45
there were 4 debian bootenvs?

lae
2017-10-31 20:45
oh ubuntu

lae
2017-10-31 20:46
I recall security_host and security_path being split at some point but don't remember if that was pre ubuntu-14 or after

greg
2017-10-31 20:54
Yeah - I get this for debian 9 without the change: ``` ????????????????? [!!] Configure the package manager ?????????????????? ? ? ??? Cannot access repository ? ? ? ? The repository on http://security.debian.org/debian-security couldn't be ? ? ? ? accessed, so its updates will not be made available to you at this ? ? ? ? time. You should investigate this later. ? ? ? ? ? ? ? ? Commented out entries for http://security.debian.org/debian-security have ? ? ? ? been added to the /etc/apt/sources.list file. ? ? -b3a4c755a305@ ? ? ? ? ??? <Go Back> <Continue> ? ? -d6e8b6d719cd@ ? ? ??????????????????????????????????????????????????????????????????????? <Tab> moves; <Space> selects; <Enter> activates buttons ```

greg
2017-10-31 20:55
@lae - any ideas?

greg
2017-10-31 20:58
it seems like it should be there, but it doesn?t seem to work for me.

greg
2017-10-31 21:00
If I continue, it will finish and work.

lae
2017-10-31 21:01
yeah it'll continue but it skips configuring the security repo in that case

lae
2017-10-31 21:01
hold on

lae
2017-10-31 21:17
Possible it's an IPv6 issue?

lae
2017-10-31 21:18
``` lae@laura:~$ curl --connect-timeout 3 -v -6 http://security.debian.org * Rebuilt URL to: http://security.debian.org/ * Hostname was NOT found in DNS cache * Trying 2607:ea00:101:3c0b::1deb:215... * After 1486ms connect time, move on! * connect to 2607:ea00:101:3c0b::1deb:215 port 80 failed: Connection timed out * Trying 2610:148:1f10:3::73... * Connected to http://security.debian.org (2610:148:1f10:3::73) port 80 (#0) > GET / HTTP/1.1 > User-Agent: curl/7.38.0 > Host: http://security.debian.org > Accept: */* > < HTTP/1.1 302 Found < Date: Tue, 31 Oct 2017 21:17:45 GMT * Server Apache is not blacklisted < Server: Apache < X-Content-Type-Options: nosniff < X-Frame-Options: sameorigin < Referrer-Policy: no-referrer < X-Xss-Protection: 1 < Location: https://www.debian.org/security/ < Cache-Control: max-age=120 < Expires: Tue, 31 Oct 2017 21:19:45 GMT < Content-Length: 285 < Content-Type: text/html; charset=iso-8859-1 < <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>302 Found</title> </head><body> <h1>Found</h1> <p>The document has moved <a href="https://www.debian.org/security/">here</a>.</p> <hr> <address>Apache Server at http://security.debian.org Port 80</address> </body></html> * Connection #0 to host http://security.debian.org left intact ``` 2607:ea00:101:3c0b::1deb:215 seems to be unresponsive

lae
2017-10-31 21:18
:<

greg
2017-10-31 21:21
I get this: ```W: The repository 'http://security.debian.org/debian-security/debian-security stretch/updates Release' does not have a Release file. N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use. N: See apt-secure(8) manpage for repository creation and user configuration details. E: Failed to fetch http://security.debian.org/debian-security/debian-security/dists/stretch/updates/main/source/Sources 404 Not Found [IP: 128.61.240.73 80] E: Some index files failed to download. They have been ignored, or old ones used instead.```

greg
2017-10-31 21:21
I put the lines back into the sources.list

lae
2017-10-31 21:21
debian-security is there twice

greg
2017-10-31 21:21
and apt-get updated

lae
2017-10-31 21:22
`/debian-security/debian-security`

greg
2017-10-31 21:23
likes that better

greg
2017-10-31 21:23
this is the line in the preseed: ``` d-i apt-setup/security_host string http://security.debian.org/debian-security ```

greg
2017-10-31 21:24
hmm it works for deb8 but not deb9.

greg
2017-10-31 21:25
the lines get built differently.

lae
2017-10-31 21:25
ok so in our environment, we have a packages mirror that currently proxies to http://security.debian.org and well I'm seeing a bunch of these lines in the access log: ` [30/Oct/2017:13:33:41 -0700] "GET /debian-security/debian-security/dists/stretch/updates/InRelease HTTP/1.1" 302 160 "-" "Debian APT-HTTP/1.3 (1.4.8)"`

lae
2017-10-31 21:26
the proxy basically redirects that to /debian-security though, so that probably explains why they don't 404

lae
2017-10-31 21:26

lae
2017-10-31 21:26
so I guess it should be fine to leave out /debian-security from the preseed?

greg
2017-10-31 21:26
I can try it

greg
2017-10-31 21:27
this : ```+d-i apt-setup/security_host string http://security.debian.org ```

greg
2017-10-31 21:27
without the +

greg
2017-10-31 21:40
okay that works.

greg
2017-10-31 21:42
With that, I think my reorg is done.

greg
2017-10-31 21:43
@lae - thanks,.

greg
2017-10-31 21:44
- I?m going to move tip on all the repos and push out new content. I?m going to also update the RackN Saas content tables to reflect the repo changes.

greg
2017-10-31 21:44
During the baseball game.

greg
2017-10-31 21:44
I?ll send out a new message on how to update your content.

greg
2017-10-31 21:45
The big thing will be to remove your existing read-only content and replace it with new content.


lae
2017-10-31 21:55
can you pull that in before you push out new content

greg
2017-10-31 21:55
Yes - I noticed that too and thought to fix it later, but cool.

greg
2017-10-31 21:59
@lae - I?m going to tweak it a little more.

lae
2017-10-31 21:59
mm ok

greg
2017-10-31 21:59
The remote access won?t work with the local-repo.

greg
2017-10-31 21:59
local-repo is intend to be really off-line mode.

greg
2017-10-31 21:59
For things that don?t have inet access.

lae
2017-10-31 21:59
remote access?


lae
2017-10-31 22:00
ah yeah

greg
2017-10-31 22:00
if local-repo is true - those won?t resolve.

greg
2017-10-31 22:00
generally.

lae
2017-10-31 22:00
locally we set that to our own internal hosts :stuck_out_tongue:

greg
2017-10-31 22:01
in which case, you won?t use local-repo anyway. :wink:

greg
2017-10-31 22:05
@lae check now. Also added the else if not exists case.

lae
2017-10-31 22:09
do you think you could also specify another variable for local security repo?

greg
2017-11-01 00:06
Are you using local-repo? And need them separate?

greg
2017-11-01 00:06
@lae

ctrees
2017-11-01 12:39
Still getting: Content Upload Failed: ValidationError New layer violates key restrictions: keysCannotBeOverridden: runner.tmpl is already in layer 1 keysCannotBeOverridden: access-keys.sh.tmpl is already in layer 1 keysCannotBeOverridden: change-stage.sh.tmpl is already in layer 1

ctrees
2017-11-01 12:41
when attempting to add os-discovery Content Package

ctrees
2017-11-01 12:41
catmini:CodeOps cat$ ./dr-provision --version dr-provision2017/11/01 12:39:56.773713 Version: v3.1.0-tip-191-7fa9a4ded571250028c61003687bd672d386910d

wdennis
2017-11-01 13:30
If anyone is going to (or already at) LISA?17 in San Fran, @ me and let?s meet up

wdennis
2017-11-01 13:30
Be there this afternoon, leaving Fri AM

spector
2017-11-01 14:29
@wdennis @shane I think Shane was going to try and stop by but I think it is doing a 2nd event as well.

greg
2017-11-01 14:38
@lae - let?s work through your remaining issue and button things up. Something like this local-security-repo true/false. If true don?t add the security repo lines. If false or not present, put the security line in place?

lae
2017-11-01 15:23
@greg I would be if we had that change since all of my other needed changes are in community content now

lae
2017-11-01 15:23
hm

greg
2017-11-01 15:23
okay - the question is: is what I described sufficient?

lae
2017-11-01 15:23
yeah, I think so

greg
2017-11-01 15:30
okay pushed something - testing now

greg
2017-11-01 15:50
That worked.

greg
2017-11-01 15:50
@lae - review the last comment when you can.

shane
2017-11-01 16:11
@wdennis - I'll be at LISA today - around Noon through the rest of the day if anyone cares to meet and discuss DRP ...etc... :slightly_smiling_face:

lae
2017-11-01 16:40
@greg ah sorry, I think I misunderstood and thought you meant the same way you implemented usage of the `local-repo` variable (because of the `if eq (.Param "local-repo") true` line). I want to be able to set `local-security-repo` to a local mirror of a security repository, if possible

lae
2017-11-01 16:41
I'm still waking up apparently

greg
2017-11-01 16:43
So a string

greg
2017-11-01 16:46
okay - I think I get it now.

lae
2017-11-01 16:48
mhm

greg
2017-11-01 16:49
like this: ``` {{if .ParamExists "local-security-repo" -}} d-i apt-setup/security_host string {{.ParseUrl "host" (.Param "local-security-repo")}} d-i apt-setup/security_path string {{.ParseUrl "path" (.Param "local-security-repo")}} {{else -}} {{if (eq "debian" .Env.OS.Family) -}} d-i apt-setup/security_host string http://security.debian.org {{else -}} d-i apt-setup/security_host string http://archive.ubuntu.com d-i apt-setup/security_path string /ubuntu {{end -}} {{end -}} ```

lae
2017-11-01 16:56
does security_path error on debian or will it just ignore it?

ctrees
2017-11-01 17:01
So does anyone know why I can't load a content package (even when I've log into the rackn beta)

greg
2017-11-01 17:01
not sure - but can preface it

greg
2017-11-01 17:01
Yeah - sorry - I forgot @ctrees.

greg
2017-11-01 17:01
Can you send me:

greg
2017-11-01 17:01
```drpcli contents list | jq .[].Name```

greg
2017-11-01 17:02
@lae ``` {{if .ParamExists "local-security-repo" -}} {{if (eq "debian" .Env.OS.Family) -}} d-i apt-setup/security_host string {{.ParseUrl "host" (.Param "local-security-repo")}} {{else -}} d-i apt-setup/security_host string {{.ParseUrl "host" (.Param "local-security-repo")}} d-i apt-setup/security_path string {{.ParseUrl "path" (.Param "local-security-repo")}} {{end -}} {{else -}} {{if (eq "debian" .Env.OS.Family) -}} d-i apt-setup/security_host string http://security.debian.org {{else -}} d-i apt-setup/security_host string http://archive.ubuntu.com d-i apt-setup/security_path string /ubuntu {{end -}} {{end -}}```

ctrees
2017-11-01 17:02
catmini:CodeOps cat$ ./drpcli contents list | jq .[].Name null null null catmini:CodeOps cat$

greg
2017-11-01 17:02
oops

greg
2017-11-01 17:03
```drpcli contents list | jq .[].meta.Name```

ctrees
2017-11-01 17:03
catmini:CodeOps cat$ ./drpcli contents list | jq .[].meta.Name "BackingStore" "drp-community-content" "BasicStore" catmini:CodeOps cat$

greg
2017-11-01 17:04
@ctrees you are trying to update the content with the update button?

ctrees
2017-11-01 17:04
I did try that...

greg
2017-11-01 17:04
Just wanting to make sure where the error is coming from

lae
2017-11-01 17:04
@greg might want to use the Param as-is for debian, since it supports a string with a path

lae
2017-11-01 17:05
so no ParseUrl

greg
2017-11-01 17:05
okay - From the UX while logged in to both saas and DRP @ctrees

greg
2017-11-01 17:05
remove the `drp-community-content`

greg
2017-11-01 17:05
then add it back.

ctrees
2017-11-01 17:06
ok..

ctrees
2017-11-01 17:07
Ok... removed, Added back, then attempted to add os-discovery and got the Content Upload Failed: ValidationError

ctrees
2017-11-01 17:09
Content Upload Failed: ValidationError New layer violates key restrictions: keysCannotBeOverridden: change-stage/map is already in layer 1 keysCannotBeOverridden: gohai-inventory is already in layer 1 keysCannotBeOverridden: kernel-console is already in layer 1 keysCannotBeOverridden: access-keys is already in layer 1 keysCannotBeOverridden: access-ssh-root-mode is already in layer 1

ctrees
2017-11-01 17:10
[GIN] 2017/11/01 - 12:08:37 | 200 | 28.719µs | 192.168.1.200 | OPTIONS /api/v3/contents [GIN] 2017/11/01 - 12:08:37 | 200 | 1.343583ms | 192.168.1.200 | GET /api/v3/contents [GIN] 2017/11/01 - 12:08:48 | 200 | 12.763µs | 192.168.1.200 | OPTIONS /api/v3/contents?version=tip [GIN] 2017/11/01 - 12:08:49 | 422 | 173.273486ms | 192.168.1.200 | POST /api/v3/contents?version=tip

greg
2017-11-01 17:13
can you send me: ```drpcli stages list | jq .[].Name```

greg
2017-11-01 17:13
@lae - ```{{if .ParamExists "local-security-repo" -}} {{if (eq "debian" .Env.OS.Family) -}} d-i apt-setup/security_host string {{.Param "local-security-repo"}} {{else -}} d-i apt-setup/security_host string {{.ParseUrl "host" (.Param "local-security-repo")}} d-i apt-setup/security_path string {{.ParseUrl "path" (.Param "local-security-repo")}} {{end -}} {{else -}} {{if .ParamExists "local-repo" -}} {{if eq (.Param "local-repo") true -}} # Use local-repo and !local-security-repo - no security to specify {{else -}} {{if (eq "debian" .Env.OS.Family) -}} d-i apt-setup/security_host string http://security.debian.org {{else -}} d-i apt-setup/security_host string http://archive.ubuntu.com d-i apt-setup/security_path string /ubuntu {{end -}} {{end -}} {{end -}} {{end -}}```

greg
2017-11-01 17:14
@ctrees - tip may have leaked out.

greg
2017-11-01 17:15
You may not need os-discovery.

shane
2017-11-01 17:17
@greg - remove the 'local' and that other file ??

shane
2017-11-01 17:17
not looking at a working drp right now

greg
2017-11-01 17:18
well - I think our SaaS got a dev update that I?m working on and it is getting in the way. I suspect travis/SaaS interaction is a little funky.

greg
2017-11-01 17:18
I?m about to push this all anyway and document a procedure.

greg
2017-11-01 17:18
@ctrees may be starting it.

ctrees
2017-11-01 17:18
yea I remember you telling wdennis that os-discovery was going into core ?

greg
2017-11-01 17:18
I suspect that @ctrees doesn?t need os-discovery and os-linux anymore.

ctrees
2017-11-01 17:18
me too...

greg
2017-11-01 17:19
```drpcli stages list | jq .[].Name```

greg
2017-11-01 17:19
will probably show lots of stages

greg
2017-11-01 17:19
The next step will be to convert the defaults to from ce-* to *

greg
2017-11-01 17:20
and check machines stages and bootenvs to make sure they are converted over.

ctrees
2017-11-01 17:21
@ctrees suspects @ctrees issue is just as the tip and content packages ... should I just wait for @greg merge fix'n

ctrees
2017-11-01 17:22
I'm not really in a rush.... but willing to test things... just let me know... I'll go learn more of the drcli commands

lae
2017-11-01 17:23
@greg lgtm

greg
2017-11-01 17:23
@lae - thanks. testing the not specified path .

greg
2017-11-01 17:24
@ctrees - hopefully a few hours and I?ll make a consistent tip step.

lae
2017-11-01 17:37
Oh yeah, I just remembered (while cleaning up our content to be fireeye-only) the other part to the `part-scheme` feature I wanted to implement - have the part-scheme templates be usable on both centos/debian (so checks included for OS family). Is it alright if I push some changes for that? (not urgent since we're not doing any centos installs anytime soon, I don't think)

greg
2017-11-01 17:37
yes

greg
2017-11-01 17:37
That sounds good

lae
2017-11-01 20:17
@greg are you working on documentation to build content? or is it possible it's been done already (haven't looked into building drbundle yet)

greg
2017-11-01 20:18
That is on the list but not done or really started.

greg
2017-11-01 20:19
drbundler allows you to do this: ``` go get -u http://github.com/digitalrebar/provision/cmds/drbundler PATH=$PATH:$GOPATH/bin ```

greg
2017-11-01 20:20
in a go1.9 build environment

greg
2017-11-01 20:20
```drbundler <directory> <yaml file>```

greg
2017-11-01 20:21
the directory is a digitalrebar/store in directory format.

greg
2017-11-01 20:21
That bundles it up into a digitalrebar/store file format.

greg
2017-11-01 20:21
The content API uses the file format as the transport form

shane
2017-11-01 20:22
@lae - that's on my plate - I'm working on documentation updates, but haven't gotten to building custom content yet

greg
2017-11-01 20:23
You can also bundle with drpcli. You would cd into the directory, run `drpcli contents bundle ../file.yaml --format=yaml`. It will produce the same thing.

lae
2017-11-01 20:33
it looked like for the `provision-content` repo you had a drp-community-content.yml file but I'm not sure what the contents of that should be

greg
2017-11-01 20:34
So in tip, there are now two directories.

greg
2017-11-01 20:34
content and contrib - one for each of the two content bundles.

lae
2017-11-01 20:34
mhm, realised that

lae
2017-11-01 20:34
(just reading through package.sh)

greg
2017-11-01 20:48
- TIP UPDATE - all components have been update and should be updated as a set (really).

greg
2017-11-01 20:48
To update to tip, you should do so in the following order. More specific steps for each will follow.

greg
2017-11-01 20:49
1. Update DRP to tip 2. Remove Old Content 3. Update Add Content 4. Update plugins 5. Fix up things

greg
2017-11-01 20:49
Okay so the first step - updating DRP

greg
2017-11-01 20:49
If you are running isolated, I do this: ```curl -fsSL https://raw.githubusercontent.com/digitalrebar/provision/tip/tools/install.sh | bash -s -- --isolated install --drp-version=tip --force```

greg
2017-11-01 20:50
This will force the update of the local binaries to tip. Make sure you stop drp.

greg
2017-11-01 20:50
You can do this for provision as well. If you modified your service file, you should check to make sure that it is still valid.

greg
2017-11-01 20:50
Restart drp

greg
2017-11-01 20:51
Don?t forget to copy `drpcli` to where you put to make it always available. :slightly_smiling_face:

greg
2017-11-01 20:51
Second Step - Remove old content

greg
2017-11-01 20:52
With the rework of content, you need to remove the following content packages.

greg
2017-11-01 20:52
os-linux

greg
2017-11-01 20:52
os-discovery

greg
2017-11-01 20:52
drp-community-content (if you are really behind, Digital Rebar Community Content).

greg
2017-11-01 20:53
ipmi

greg
2017-11-01 20:53
packet

greg
2017-11-01 20:53
virtualbox

greg
2017-11-01 20:53
Ensure those are gone.

greg
2017-11-01 20:54
Third Step- Put the content back.

greg
2017-11-01 20:54
drp-community-content - it is a must just get it.

greg
2017-11-01 20:54
task-library - New RackN library of services for doing interesting things.

greg
2017-11-01 20:55
drp-community-contrib - this is old or experimental things like centos6 or SL6.

greg
2017-11-01 20:55
Step Four - update the plugins.

greg
2017-11-01 20:55
If you have any plugins installed, update them now.

greg
2017-11-01 20:57
To facilitate version tracking, plugins provide their own content as a injected content from the plugin. When the plugin is added, it will also add a content layer that will show up in the content packages section.

greg
2017-11-01 20:57
Step Five - fix things up. This is mainly if you were using the ce-* version of things.

greg
2017-11-01 20:58
AND making sure all the bootenvs are up to date. This is a task you should always do after updating content.

greg
2017-11-01 20:58
Go to BootEnvs and make sure that discovery, sledgehammer, and your OS install images are still good. These could have updated and new ISOs need to be downloaded.

greg
2017-11-01 20:58
Fix those.

greg
2017-11-01 20:59
Then go to Info and preferences and make sure your default stage and bootenvs are still valid.

greg
2017-11-01 20:59
this is where `ce-sledgehammer` become `sledgehammer` and `ce-discovery` becomes `discovery`

greg
2017-11-01 21:00
The same with `ce-ubuntu-16.04-install` becomes `ubuntu-16.04-install`.

greg
2017-11-01 21:00
The same with `ce-centos-7.4.1708-install` becomes `centos-7-install`.

ctrees
2017-11-01 21:05
I'm up to step 5, so far so good (loading iso now)

greg
2017-11-01 21:06
Cool! One last thing. Make sure your machine?s stages and bootenvs are valid and update them if not.

ctrees
2017-11-01 21:07
will do... I'm doing before and after screen shots too... in hopes that I don't need them :wink:

ctrees
2017-11-01 21:09
I was attempting to figure out how to do the content package remove and update via command line... but seems the gui is involved in gen of tokens ... so used GUI

greg
2017-11-01 21:16
well the GUI (or UX as we call it) has the login access to the SaaS to get the content. You can use the cli if you have the yaml files locally. We don?t have a way for you to get this outside of the UX yet.

greg
2017-11-01 21:16
The content flow is: SaaS -> UX -> DRP

greg
2017-11-01 21:16
where the UX acts a bridge between the two systems.

shane
2017-11-01 21:28
@ctrees - you can do via CLI - it's just not "very pretty" yet, since we don't have the `drpcli` binary baked w/ the content download pieces yet

shane
2017-11-01 21:28
```drpcli contents list # see all contents drpcli contents list | jq -r '.[].meta.Name' # get raw output of just the content packs drpcli contents destroy <name> # remove content # go to RackN UX - log in, go to Hamburger menu (upper left, 3 horizontal lines) # go to Organization - User Profile - copy your UUID for Unique User Identity export RACKN_AUTH="?username=<UUID_Unique_User_Identity>" export CATALOG="https://qww9e4paf1.execute-api.us-west-2.amazonaws.com/main/catalog" curl -s $CATALOG/content/<content_name>${RACKN_AUTH} -o <content_name>.json drpcli contents create -< <content_name>.json # same steps for Plugins - but replace "content_name" with "plugin_name" # change "contents" commands to "plugin_provider"```

shane
2017-11-01 21:30
you can (obviously) chuck the output of the `contents list` and `jq` filter command in to a loop to vaguely automate destroy/curl/create operations

ctrees
2017-11-01 21:31
So I was having an issue with sledgehammer coming up as a stage... but I only loaded : drp-community-content(tip) and task-library(tip)

greg
2017-11-01 21:31
`sledgehammer` should be in `drp-community-content`

ctrees
2017-11-01 21:33
ok... I see it there... so should it show up in the Pref dropdown ?

ctrees
2017-11-01 21:33
and in stages, it an X

greg
2017-11-01 21:34
if it is available. You have to check the bootenv and the stages.

greg
2017-11-01 21:34
if you open it, it should tell you what the error is.

ctrees
2017-11-01 21:34
it's in bootenv... checking status

greg
2017-11-01 21:34
iso upload and explode is now async.

greg
2017-11-01 21:35
So you may need to refresh the screen to see if it finished.

ctrees
2017-11-01 21:39
how long should I wait after iso upload for it to explode ?

ctrees
2017-11-01 21:40
It's in bootenv, but not in Stages

ctrees
2017-11-01 21:41
I've logged out and back in... and hit page refresh and Refresh button in Stages

greg
2017-11-01 21:41
Could take awhile, but not too long. 10 minutes at most. - oh wait.

greg
2017-11-01 21:41
did you do ```drpcli bootenvs uploadiso sledgehammer```

greg
2017-11-01 21:41
That will do the magjc for you. or most of it.

greg
2017-11-01 21:41
I need to run. I?ll be on later.

ctrees
2017-11-01 21:42
catmini:CodeOps cat$ ./dr-provision --version dr-provision2017/11/01 20:54:11.070347 Version: v3.1.0-tip-193-12226aa05308b164a18f164546146eac7c549986 catmini:CodeOps cat$ ./drpcli bootenvs uploadiso sledgehammer catmini:CodeOps cat$ ./drpcli bootenvs uploadiso centos-7-install catmini:CodeOps cat$ ./drpcli bootenvs uploadiso ubuntu-16.04-install catmini:CodeOps cat$

greg
2017-11-01 21:43
```drpcli bootenvs show sledgehammer```

greg
2017-11-01 21:43
That should show what it things the errors are

ctrees
2017-11-01 21:49
catmini:CodeOps cat$ ./drpcli bootenvs show sledgehammer { "Available": true, "BootParams": "rootflags=loop root=live:/sledgehammer.iso rootfstype=auto ro liveimg rd_NO_LUKS rd_NO_MD rd_NO_DM provisioner.web={{.ProvisionerURL}} rs.uuid={{.Machine.UUID}} rs.api={{.ApiURL}} -- {{if .ParamExists \"kernel-console\"}}{{.Param \"kernel-console\"}}{{end}}", "Description": "Ram-Only image loaded with tools to allow for discovery and maintenance", "Errors": [], "Initrds": [ "stage1.img"

ctrees
2017-11-01 21:49
seems fine...

ctrees
2017-11-01 21:50
Boot Environments - sledgehammer (click on it) looks fine...

ctrees
2017-11-01 21:52
Stages - discover - Error: Stage discover wants BootEnv sledgehammer, which is not available

ctrees
2017-11-01 22:10
So.... reboot of server fixed it

ctrees
2017-11-01 22:15
and I THINK I'm getting your overall pattern... saas <file.yml> loads lots of defaults WHICH then drp-data content can 'update'

ctrees
2017-11-01 22:44
ohh... pretty new workflow icons

ctrees
2017-11-01 22:53
So... I got a VBox to start to boot... then it failed to load stage2.img

ctrees
2017-11-01 22:55
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7TG0BJPM/stage2_loadfail.png and commented: VBOX pxe to endpoint 192.168.1.200 from 192.168.33.10 via host vboxnet0 192.168.33.1

ctrees
2017-11-01 22:59
I'm sending broadcast to 192.168.33.1 (vboxnet0) ... route ?

ctrees
2017-11-01 23:00
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F7T34CTU1/stage2_loadfail_netstat.png and commented: netstat -nr on mac hosting endpoint

ctrees
2017-11-01 23:33
I started drp up on the 'fake' vboxnet0 IP 192.168.33.1 and it was able to register... I guess I'll ask in the AM how you guys suggest a dev laptop setup...

lae
2017-11-02 00:54
``` Nov 02 00:50:43 labs-provision dr-provision[26688]: dr-provision2017/11/02 00:50:43.588393 Static FS: Failed to render template for /machines/c25b7315-3d50-4134-9074-5cda9abaeee5/seed: template: :589:5: executing "net-seed.tmpl" at <eq (.Param "local-re...>: error calling eq: incompatible types for comparison ``` @greg okay so I realised I made a grave assumption of what `local-repo` had meant, didn't realise it was specifically a boolean to configure the installer to use a repo from the exploded ISO until now

lae
2017-11-02 00:56
gotta go now but basically I have a need to use a locally hosted mirror for debian/ubuntu/centos

greg
2017-11-02 01:13
@lae I was waiting for this.

greg
2017-11-02 01:14
@lae we can work on a change. I?m in favor of it. We need a better way.

lae
2017-11-02 15:20
@greg I'm thinking about this more throughly -- centos/rhel "primary" mirror is specified by `url --url`, debian/ubuntu by `mirror/http` centos extra repos/mirrors specified by `repo--name=X --baseurl=X` and debian/ubuntu by `apt-setup/local#/repository $mirror $dists` with optional comment/deb-src toggle/URL to key I can use profiles to specify mirrors for different stages (so diff for ubuntu/deb/cent), so maybe a single `mirrors` array of objects parameter could work here? would it be doable to filter out something like a "primary" mirror for all dists and a "security" mirror for deb/ubuntu based on a child parameter? e.g. ``` mirrors: - name: main url: http://packages.local/debian dist: stretch main install_mirror: true - name: updates url: http://packages.local/debian dist: stretch-updates main - name: security url: http://packages.local/debian-security dist: stretch/updates main security_mirror: true ```

greg
2017-11-02 15:54
@lae, I need to think about this a little bit more. I like where this is going. You can then do profile specification of secondary.

greg
2017-11-02 15:55
I?d add a type in the field, primary, secondary, security.

greg
2017-11-02 15:56
maybe - thinking - need to look at a couple of other things first.

greg
2017-11-02 15:58
nvm - no type

greg
2017-11-02 15:58
you have it in `install_mirror` and `security_mirror`

vlowther
2017-11-02 15:59
@lae On the mirrors thing -- do you mostly care about the repos used during the OS install process, or the ones used afterwards?

lae
2017-11-02 15:59
both

vlowther
2017-11-02 16:00
because one of the things I have been kicking around is how to reduce the amount of information we need to make a given bootenv work down to just needing the kernel/initrd pair

lae
2017-11-02 16:00
while I *could* just configure them with ansible afterwards, we've just been using preseed/kickstart to configure them for quite a while

lae
2017-11-02 16:00
the mirrors var would be optional, though?

vlowther
2017-11-02 16:02
I have been reluctant to use distro mirroring schemes

vlowther
2017-11-02 16:02
as it makes fully offline deploys much trickier.

lae
2017-11-02 16:02
what do you mean?

2017-11-02 16:03
whe core become private? i have not access on that

vlowther
2017-11-02 16:04
If I want to run DRP in an environment where I do not have Internet access at all, I cannot rely on whatever default lists of mirrors to try for distro ops.

lae
2017-11-02 16:05
that's...true but that's not my use case

lae
2017-11-02 16:05
we have local mirrors within our own network

vlowther
2017-11-02 16:06
ok

vlowther
2017-11-02 16:06
What I did for DRv2 was to provide in overarching "repos to use" list: https://github.com/digitalrebar/digitalrebar/blob/master/core/barclamps/rebar.yml#L157

vlowther
2017-11-02 16:08

vlowther
2017-11-02 16:08
Would they meet your needs?

vlowther
2017-11-02 16:09
They do not solve the current requirement that DRP needs its own local mirror for OS installation purposes, but that would be a seperate step.

lae
2017-11-02 16:10
yeah, that wouldn't

lae
2017-11-02 16:11
I also don't need to introduce DR in our environment, it'll just cause information overload for people on my team without much benefit over our existing infra

vlowther
2017-11-02 16:11
ok

lae
2017-11-02 16:12
our environment also doesn't have that large of a pipe to the Internet, so for deployments of 5 nodes installing using Internet-hosted repos takes a pretty long time

vlowther
2017-11-02 16:12
I was asking of porting those roles over to DRP tasks would be sufficient

lae
2017-11-02 16:12
and the debian bootenv I don't believe has a fully usable "local-repo" as it currently is specified (using the mini iso)

lae
2017-11-02 16:13
oh

vlowther
2017-11-02 16:13
ya, the Debian bootenvs were more an exercise in minimalism than anything else.

lae
2017-11-02 16:13
Yeah, that would probably work then, but I would still need to be able to specify a repo for installation

vlowther
2017-11-02 16:14
Swalling out the netinstall isos for a more full-featured iso should work with minimal tweaks.

vlowther
2017-11-02 16:14
er, swapping

greg
2017-11-02 16:17
Okay - so there are two problems.

greg
2017-11-02 16:17
1. installation repo specification. We want to values for that: exploded iso in drp, and user specified URL.

greg
2017-11-02 16:18
2. Post-install repos - We want those done in post-install (kickstart,preseed) step.

greg
2017-11-02 16:19
Porting DR code as a task/parameter/stage would take care of #2. The specification structure may want to be simplified if possible. The task should be able to run in both the ks/seed env and post install, but that is more a goal.

greg
2017-11-02 16:20
With regard to #1, the current local-repo isn?t sufficient - it only points at the exploded ISOs repos. A specifier needs to be used. If it could use the same spec as #1 that would work.

vlowther
2017-11-02 16:23
ya -- in addition to #1, we would need a codepath for taking care of kernel and initrd handling.

greg
2017-11-02 16:31
This last one would be for handling not exploding isos at all and just referencing kernel/initrd images only.

vlowther
2017-11-02 16:37
hm... I had forgotten how annyoing the whole security repo vs. non security repo things is for Debianoids

vlowther
2017-11-02 16:43
ok

vlowther
2017-11-02 16:43
so how about this as a strategy

vlowther
2017-11-02 16:44
The stuff I currently use for DRv2 is not really suited to the OS installation phase

vlowther
2017-11-02 16:44
because it is based around working with file snippets, not raw repo information

vlowther
2017-11-02 16:45
So instead, we have a two-part solution.

vlowther
2017-11-02 16:46
1: A parameter that defines repo information in as OS agnostic fashion as we can reasonably define

vlowther
2017-11-02 16:46
something along the lines of whgat you described upstream, lae

vlowther
2017-11-02 16:47
The second part is code baked into DRP that knows how to take repos defined in that parameter and expand them in the context of a target OS

vlowther
2017-11-02 16:49
so that we can get repo definitions approprieate for writing to /etc/sources.list, etc/yum.repos.d, preseed lines, or kickstart lines

vlowther
2017-11-02 16:53
The goal is to have somehting like this: {{range .ReposFor "target-os"}} {{ .Repo "desired-format" . }} {{ end }}

vlowther
2017-11-02 16:54
and have that sequence spit out the repo information for the OS we want and in the format we want.

vlowther
2017-11-02 17:01
We already have a defined place in our template rendering functions to plug helpers like this

vlowther
2017-11-02 17:01
and getting this into DRP is a matter of porting some code I have aying around in some older versions of DRv2.

shane
2017-11-02 17:02
@vlowther do you already have something that can take an abstract "this repo info" and produces all of that apt/yum/ks/seed elements appropriately ?

shane
2017-11-02 17:03
that's not too hard to do - just an annoying exercise ... :slightly_smiling_face:

vlowther
2017-11-02 17:04
I have written code to to that in at least 2 languages so far.

vlowther
2017-11-02 17:04
:slightly_smiling_face:

lae
2017-11-02 17:12
haha

vlowther
2017-11-02 17:31
One of which was some gnarly bash.

vlowther
2017-11-02 17:36
@lae That sound like a reasonable path forward?

lae
2017-11-02 17:38
Yeah, that does

lae
2017-11-02 17:40
does feel like it's somewhat trespassing boundaries of what DRP the binary should do vs what templates should do, but I don't really have a strong opinion on that

vlowther
2017-11-02 17:45
Eh, I tend to err on the side of making the templates easier to write and read

vlowther
2017-11-02 17:46
and the current solution does not make them easy to read.

vlowther
2017-11-02 17:46
and since DRP is the thing doing the template expansion...

vlowther
2017-11-02 17:46
ok

lae
2017-11-02 17:47
yeah, I can see this getting out of hand if it was just done through templates only with current DRP

vlowther
2017-11-02 17:47
ok

lae
2017-11-02 17:47
the local-repo/local-security-repo thing took a bit of wrapping my head around

vlowther
2017-11-02 17:47
I will get started on the DRP side of this path.

lae
2017-11-02 17:48
:+1:

vlowther
2017-11-02 17:48
Should have something to review in a day or so.

lae
2017-11-02 17:50
is drpcli machines processjobs supposed to exit?

lae
2017-11-02 17:51
it seems to process through all of the tasks for the debian-9-install stage but...

lae
2017-11-02 17:52
I tried unsetting change-stage/map, then creating a new one with debian-9-install ? complete though it seems to still get stuck

lae
2017-11-02 17:53
setting the stage manually to `none` for the machine externally seems to let it proceed fine

shane
2017-11-02 17:53
@lae - set it to complete-nowait:Success

lae
2017-11-02 17:53
ah the nowait one

greg
2017-11-02 18:26
I need to document the use of the stages.

vlowther
2017-11-02 20:52
OK, after some more hacking and research, here is what I propose for a package-repositories parameter:

vlowther
2017-11-02 20:53
- name: "centos-7 install" os: "centos-7" # If installSource is true, then the URL points directly # to the location we should use for all OS install purposes # save for fetching kernel/initrd pairs from (for now, we will # still assume that they will live on the DRP server). # The os field must be an exact match for the bootenv's OS.Name field. installSource: true # For redhat-ish distros, this URL contains distro, # component, and arch components, and as such # they do not need to be further specified url: "http://mirrors.kernel.org/centos/7/os/x86_64" - name: "centos-7 everything" # Since installSource is not true here, # we can define several package sources at once by # providing a distribution and a components section, # and having the URL point at the top-level directory # where everything is housed os: centos-7 url: "http://mirrors.kernel.org/centos" distribution: "7" components: - atomic - centosplus - cloud - configmanagement - cr - dotnet - extras - fasttrack - opstools - os - paas - rt - sclo - storage - updates - name: "debian-9 install" os: "debian-9" installSource: true # Debian URLs always follow the same rules, no matter # whether the OS install flag is set. As such, # you must always also specify the distribution and # at least the main component, although you can also # specify other components. url: "http://mirrors.kernel.org/debian" distribution: stretch components: - main - contrib - non-free - name: "debian-9 backports" os: "debian-9" url: "http://mirrors.kernel.org/debian" distribution: stretch-updates components: - main - contrib - non-free - name: "debian-9 security updates" os: "debian-9" url: "http://security.debian.org/debian-security/" securitySource: true distribution: stretch/updates components: - contrib - main - non-free

vlowther
2017-11-02 20:53
(sorry for the spam, but it includes comments!)

shane
2017-11-02 20:55
oye @vlowther! use a "text snippet" for that length of paste, please :slightly_smiling_face:

vlowther
2017-11-02 21:04
As a matter of policy, I only do that when Slack tells me to, :stuck_out_tongue:

vlowther
2017-11-02 21:10
@lae That look sane to you?

shane
2017-11-02 21:12
well the nice thing about text snippets is it'll also do color context highlighting which makes it a LOT easier to read

lae
2017-11-02 21:39
^

lae
2017-11-02 21:43
@vlowther how do you specify arch for centos?

vlowther
2017-11-02 21:45
Right now I am going to let it autodetect.

lae
2017-11-02 21:46
through DRP?

vlowther
2017-11-02 21:46
ya, based on whatever arch the node we are installing is.

vlowther
2017-11-02 21:47
For Centos7 this will be pretty easy. :wink:

vlowther
2017-11-02 21:50
and for all yum-like repo formats it boils down to just using $basearch in the URL line for the individual repo

vlowther
2017-11-02 21:51
The .Repo template function will be responsible for building the urls appropriately for known operating system types.

vlowther
2017-11-02 21:52
I plan on supporting RPM distros that use yum style .repo files and deb sitros that use apt initally.

vlowther
2017-11-02 21:53
Weirder stuff can be added on an as-needed basis.

vlowther
2017-11-02 21:54
much as I like my arch linux install and pacman for package management, it doesn't exactly have a large marketshare or a decent way of doing unattended installs. :slightly_smiling_face:

lae
2017-11-02 22:09
ah right forgot about $basearch

lae
2017-11-02 22:10
also yeah, unattended installs of arch :joy:

lae
2017-11-03 00:02
``` Starting Task: change-stage (8af3d3ff-4bc8-4abe-b369-944fe82a16ea) Running Task Template: change-stage.sh.tmpl Command change-stage.sh.tmpl failed to start: fork/exec ./script: no such file or directory Task Template , change-stage.sh.tmpl, failed Task: change-stage failed ``` woops

lae
2017-11-03 00:03
(no bash on this installer)

lae
2017-11-03 00:34
(i'm trying a statically compiled bash aaand it turns out the kernel on this is 32 bit woops)

greg
2017-11-03 00:39
I have a pending change scheduled to make change-stage part of the runner. IT will avoid this problem. Not there yet though.

lae
2017-11-03 00:41
ok atm I guess I'm making a smaller task just to run drpcli to change stage to complete-nowait for these particular images

vlowther
2017-11-03 13:57
@lae no bash?

vlowther
2017-11-03 13:57
Heresy.

ctrees
2017-11-03 18:16
Just listened to @zehicle and @wdennis youtube.... I'm willing to write docs... in the past I did it by just rebuilding things over and over attempting the marketing 'demo'... So I'd basically do that for a DR demo of what @wdennis was describes.

ctrees
2017-11-03 18:22
What I was doing was writing myself docs to do @greg VBox demo and then move to @zehicle kubespray demo as I'm under the impression that DRP is really "PXE to Node" - (inventory def) -> "Node to Cluster (or system)"... which I intend to use Ansible (as I think @wdennis also intends)

ctrees
2017-11-03 18:30
... ANYWAY... workflow wise... you guys are pretty good at showing your working demos... I'm thinking if I just do what I'm doing (which is going through the video and attempting to re-create) then cross link the resources and documents... what I did in the past was to link resource much like the angular docs but also with video step links ( https://docs.angularjs.org/api/ng/service/$document )

vlowther
2017-11-03 19:46
@lae and other interested parties: https://github.com/digitalrebar/provision/pull/530 is the start of adding support baking basic repo management into DRP at template rendering time.

vlowther
2017-11-03 19:48
If you want better names for the functions that the templates will use to render templates, now is the time to suggest better ones. Preferably as review comments :slightly_smiling_face:

zehicle
2017-11-04 19:57
are you asking for "repo" vs something else?

david.bruce
2017-11-06 00:43
has joined #json

greg
2017-11-06 21:28

spector
2017-11-06 21:29
Congrats

spencerj
2017-11-06 21:30
:+1:

wdennis
2017-11-06 22:15
@greg Is it doable to upgrade from v3.2.0-tip-3-00bcb20b04826393bd426478ee260c553225e463 to v3.2.1 ??

greg
2017-11-06 22:15
Yes

shane
2017-11-06 22:15
3.2.0 to 3.2.1 should be easy

shane
2017-11-06 22:15
just the dr-provision binary needs to be replaced

shane
2017-11-06 23:37
tomorrows meetup details are posted ... we look forward to seeing you if you can make it ... : https://www.meetup.com/digitalrebar/events/243490141/

wdennis
2017-11-07 00:58
@shane can I curl/wget the v3.2.1 dr-provision binary from somewhere?

shane
2017-11-07 01:02
not just the binary - but you can get the zip file and just extract the binary - if you check the installer script (https://get.rebar.digital/stable), you'll see: ```echo "Installing Version $DRP_VERSION of Digital Rebar Provision curl -sfL -o dr-provision.zip https://github.com/digitalrebar/provision/releases/download/$DRP_VERSION/dr-provision.zip curl -sfL -o dr-provision.sha256 https://github.com/digitalrebar/provision/releases/download/$DRP_VERSION/dr-provision.sha256```

shane
2017-11-07 01:02
so substitute the DRP_VERSION (eg "stable") for the variable - and you can wget / curl it directly

wdennis
2017-11-07 01:12
OK, done, thx

wdennis
2017-11-07 01:21
Is there any reason that in UX?s ?Machines?, when you edit a node, you cannot change the ?Name? of the node?

wdennis
2017-11-07 01:22
I can see making the UUID immutable, but not the Name?

greg
2017-11-07 01:24
Name should be Changable. Fqdn is required

wdennis
2017-11-07 01:25
FQDN? I?ve always just used a shortname for ?Name?

lae
2017-11-07 01:26
I think in the UI it's not changeable

lae
2017-11-07 01:27
yeah, I can't edit it from the UI

lae
2017-11-07 01:27
or at least, it's not obvious how

wdennis
2017-11-07 01:27
I don?t know why, I can change it thru drpcli

lae
2017-11-07 01:27
I would expect that to be a UI bug

shane
2017-11-07 01:29
that is a UX feature which hasn't been implemented yet ... please feel free to submit an enhancement request ...

wdennis
2017-11-07 01:34
OK, done - #537

shane
2017-11-07 01:35
thx!

wdennis
2017-11-07 15:15
ssh root@912.168.1.143

shane
2017-11-07 15:58
Is that some cool new v4 IP address space ? :slightly_smiling_face:

2017-11-07 18:28
Hey Guys........Can I use DRP to install Firmware on Dell Servers?

2017-11-07 18:41
Not yet. I have not ported the dell-firmware-flash role over from digitalrebar yet.

2017-11-07 20:19
so for now...If I have to use digital rebar to update the firmware...is it not possible?

shane
2017-11-07 20:20
hello @No1 - it's possible that it can be integrated if you have already done some of the automation with the firmware/bios tools - they can be "dropped in place"

shane
2017-11-07 20:21
but @vlowther is referring to the work we've done in Digital Rebar v2 (DRv2) - which hasn't been ported in to Digital Rebar Provision v3 (DRPv3)

2017-11-07 20:21
Ohh I see...

2017-11-07 20:21
Excuse me here for my bluntness... Can we use DigitalRebar instead of Foreman?

2017-11-07 20:22
and can digital rebard do all the things that can be done by foreman?

lae
2017-11-07 20:22
I don't think DR manages virtual machines/vm hosts like foreman can

greg
2017-11-07 20:23
well, DRP can install and provision them, but currently can?t create new instances.

lae
2017-11-07 20:24
right

lae
2017-11-07 20:24
(I mean I'm also using it to provision VMs, but I'm using Proxmox for managing them)

greg
2017-11-07 20:25
drp?

lae
2017-11-07 20:26
yes

greg
2017-11-07 20:26
I?m interested in knowing what path you took to do it?

lae
2017-11-07 20:28
It's...the same as any other physical machine? I just create a machine definition in DRP and let the KVM host PXE boot with DRP's instructions.

2017-11-07 20:29
yeah...we want to use it to provision the physical hardware...?

greg
2017-11-07 20:29
okay - yeah. Sorry, I thought you meant that you were having DRP create the machine. Okay - I understand now.

lae
2017-11-07 20:29
Ah yeah no, although I can see a possibility of having Ansible do both the creation of a VM in Proxmox and then a machine definition in DRP

2017-11-07 20:29
are there any tutorials available on how to deplot DRP and then use it?

greg
2017-11-07 20:30
@lae, I think if you look at the new content, you should be able to use stage-chooser to not even create the machine before had if you want, but ?

lae
2017-11-07 20:30
Terraform too if only there were a maintained Proxmox provider

greg
2017-11-07 20:31
Yeah - that would be nice. @shane is working on some examples with packet for that.

shane
2017-11-07 20:32
@No1 - have you seen the quickstart documentation?? http://provision.readthedocs.io/en/latest/doc/quickstart.html

2017-11-07 20:33
yeah I did .. but from there how to proceed further I don't have any idea ... i will dig deep

shane
2017-11-07 20:41
do you have specific questions ? issues ?

2017-11-07 21:06
I mean would I get an userinterface to provision the systems?

2017-11-07 21:07
to define the subnets/domain/operating systems etc.?

zehicle
2017-11-07 21:12
@No1 - if you connect to https://[endpoint ip]:8092 then you will be redirected to the UX

zehicle
2017-11-07 21:12
there are a lot of videos available that show to work the system

2017-11-07 21:12
okay..thanks lemme try that !


zehicle
2017-11-07 21:13
note: it's actually https://[endpoint ip]:8092/ui

zehicle
2017-11-07 21:13
the REST api is https://[endpoint ip]:8092/api/v3

2017-11-07 22:18
okat thats great...looks like the videos got updated since I saw them last time.

2017-11-07 22:18
Thanks Zehicle :)

zehicle
2017-11-07 22:35
Glad to help. If it's been a while, then a lot has changed.

wdennis
2017-11-07 23:26
So, let me try to define Stages and Tasks...

wdennis
2017-11-07 23:27
Stages have a [optional] BootEnv, [optional] Profiles, and a list of Tasks

wdennis
2017-11-07 23:28
The list of tasks [are | are not] processed serially by a Runner <-- pls advise as to which is correct

wdennis
2017-11-07 23:29
The stage _usually_ ends in a RunnerWait state, but may not

wdennis
2017-11-07 23:30
They also _may_ contain OptionalParams, RequiredParams, and Templates

wdennis
2017-11-07 23:32
Tasks have Templates they render, _usually_ have OptionalParams, and _may_ have RequiredParams

wdennis
2017-11-07 23:34
So, Stages are collections of Tasks, which the Runner processes, then when it hits the end, it [usually] waits for more Tasks to be submitted, which may be the result of a Stage change (which has it's own list of Tasks)

wdennis
2017-11-07 23:36
(May be good to have official definitions of these somewhere, and a graphic showing the relationship and interaction with the Runner process)

wdennis
2017-11-07 23:37
(Let me know if I'm even close on the above def's)

shane
2017-11-07 23:53
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F7W3M7P1P/runner-workflow.pdf and commented: Work In Progress - but this is what runner workflow looks like ...

greg
2017-11-08 00:46
The runner process the list of tasks on a machine in order and stops on first failure.

vlowther
2017-11-08 02:41
Precisely what happens in encapsulated in the large comment at the top of https://github.com/digitalrebar/provision/blob/master/backend/jobs.go

vlowther
2017-11-08 02:42
We should probably turn that (and the POST logic for api/v3/jobs at https://github.com/digitalrebar/provision/blob/master/frontend/jobs.go#L241) into an actual document.

ctrees
2017-11-08 14:50

ctrees
2017-11-08 14:51
ietf IoT firmware update working draft

lae
2017-11-08 21:21
got around to updating the drpcli package in AUR to 3.2.1 https://aur.archlinux.org/packages/drpcli/

vlowther
2017-11-08 22:15
aieee!

vlowther
2017-11-08 22:16
you have AUR rights!

vlowther
2017-11-08 22:17
Feel free to do one for dr-provision as well. :slightly_smiling_face:

vlowther
2017-11-08 22:19
I will also gleefully accept a PR for a pkgbuild

vlowther
2017-11-08 22:19
I could use it locally. :slightly_smiling_face:

vlowther
2017-11-08 22:20
although I will have to refresh my makepkg memory. It has been awhile.

lae
2017-11-08 22:28
do you run dr-provision on arch? I don't exactly have that requirement (it gets deployed in a Debian LXC container running in our engineering environment)

vlowther
2017-11-08 22:28
In fact, I do.

vlowther
2017-11-08 22:30
Although in my case, it is from locally-built source, not from the pre-built tarballs. :slightly_smiling_face:

vlowther
2017-11-08 22:31
sudo systemctl stop dr-provision && tools/install.sh install && sudo systemctl start dr-provision is a fairly common thing for me.

vlowther
2017-11-08 22:33
In fact, I will whip up a -git version

justin
2017-11-09 14:24
has joined #json

shane
2017-11-09 14:41
good morning/@justin - welcome

ctrees
2017-11-09 16:47
Was going through kubespray and ran into jujucharms... has anyone used ? seems to terraform 'like'

will.acheson
2017-11-09 17:16
has joined #json

shane
2017-11-09 17:28
welcome @will.acheson

will.acheson
2017-11-09 17:29
Hey shane! Thanks for the invite. I think it was a great idea for us to use slack for comms.

shane
2017-11-09 17:29
:slightly_smiling_face:

zehicle
2017-11-09 17:53
@ctrees juju is pretty much a Canonical thing, not nearly as mainstream - it's the basis of all their installers so Ubuntu focused. It's pretty interesting in how it builds a deployment graph.

zehicle
2017-11-09 17:56
typically, Juju is coupled with MaaS (which I consider a DRP alternative) because of the Canonical angle.

zehicle
2017-11-09 17:58
@justin if you want to follow-up on the twitter thread about RR, this is the place. We can talk about using the runner for post-provision in Sledgehammer (or any O/S) or other approaches.

zehicle
2017-11-09 17:58
I believe the runner workflow was a topic on the last community meeting (which was recorded).

shane
2017-11-09 17:59
@zehicle @justin - the community meeting centered around `stages`, we indirectly touched on the runner in that presentation - there was some spirited discussion on the Runner (tasks/jobs/queues/etc) related to Stages after the slide-ware presentation

shane
2017-11-09 18:00
the recorded video can be found at: http://bit.ly/2yfRXVW

vlowther
2017-11-09 18:08
Baked-in repo management support in dr-provision is about ready: https://github.com/digitalrebar/provision/pull/530

vlowther
2017-11-09 18:08
@lae ^^

vlowther
2017-11-09 18:12
@ctrees DRv2 built and maintained a complete graph of everything that should happen to all machines. It actually turned out to be harder to explain and made things too rigid once the complexity of what any given workload was trying to do got past a certian point.

vlowther
2017-11-09 18:15
is why DRP has a per-machine list of tasks that get executed in order by the runner (in-order execution is easy to explain and reason about), and a mechanism for making bulk changes to that list (stages)

ctrees
2017-11-09 18:19
Thanks... going over the stages in community and @wdennis summary is really helping it to sinking in...

ctrees
2017-11-09 18:29
I'm going into UNI tomorrow to talk to Dr. Paul Gray... then it's off to talk to prof's at ISU... basically going to see what he thinks of the kubespray and terraform demo's for his microservices classes I think he's using Ceph storage now, but had an AFS (OpenAFS) stack also (which ISU also has) Dr. Gray is a Proxmox fan. My boss just wants to make sure he can hire out of the Universities to support what-ever...

marco.simoes
2017-11-09 19:00
has joined #json

shane
2017-11-09 19:17
welcome @marco.simoes

zehicle
2017-11-09 19:41
@ctrees cool and good luck

zehicle
2017-11-09 19:46
I've heard of Proxmox but don't know any users. How light weight?

ctrees
2017-11-09 20:17
So... I take it @shane 5min example is 'sort-of' what provision's CI/CD is/will be ?

ctrees
2017-11-09 20:18
Oh... I'm sure if DocGray likes this, I'll be doing a proxmox setup...

ctrees
2017-11-09 20:19
and I'm pretty sure he will... I already mentioned PXE swagger API and he lit up...

zehicle
2017-11-09 20:27
Takes a special kind of geek to love that phrase. Our kind of geek fwiw

ctrees
2017-11-09 20:27
I'm going to end up doing a Xen one too... plus supporting some Vagrant up... (though I hope to remove as much ruby code as possible as I move old puppet to ansible)

shane
2017-11-09 22:27
@ctrees - the 5min-drp stuff is a good model for integrating DRP into a CI/CD pipeline - right now it has "packet specific" plugin_provider, but it's pretty easy to tweak it to change-up the various content you inject

zehicle
2017-11-09 22:29
@ctrees after listening to the OpenStack discussions about Edge, there's a chance that Proxmox would be interesting if it's lighter weight than openstack

shane
2017-11-09 22:31
proxmox has grown a lot since I last looked at it - it used to be just a lightweight "manage VM compute instances on KVM

shane
2017-11-09 22:31
seems to have matured a fair bit beyond just that


lae
2017-11-09 23:24
but yeah, I mean it still is lightweight compared to openstack

ctrees
2017-11-10 14:34
@lae THANKS (nice repo)

shane
2017-11-10 15:03
@lae - I think the Moon is lightweight compared to OpenStack now ...

zehicle
2017-11-10 17:02
hides from all the shade

shane
2017-11-10 17:03
Moon Shade....

vlowther
2017-11-10 17:06
The repo management patch has been merged. You can try it out by building from source and doing the needful based on https://github.com/digitalrebar/provision/blob/master/doc/arch/data.rst#package-repositories

greg
2017-11-10 17:09
tip has it too.

vlowther
2017-11-10 17:10
The next content and DRP release will have a valid param definition for package-repositories and the server will know how to handle templates that use the .InstallRepos and .MachineRepos helpers.

vlowther
2017-11-10 17:10
The rest of the default templates will be converted over the next few releases.

vlowther
2017-11-10 17:12
The default behaviour in the absence of any defined repos is to fall back to the current local-repos behavior, so current content will continue to function normally.

shane
2017-11-10 17:20
Nice!

justin
2017-11-12 02:53
I can't assume anyone is hanging out here on a Saturday night. I FINIALLY got systems to boot to DRP

shane
2017-11-12 02:54
woot!

shane
2017-11-12 02:54
(yeah - I'm hanging out here...)

shane
2017-11-12 02:54
what was the hurdle ?

greg
2017-11-12 03:02
Nice!!!

justin
2017-11-12 03:18
lots of hurdles. I already have dhcp on my network so enabling it in dr-provision will break other clients. I tried running a dhcp proxy server https://github.com/digitalrebar/provision/issues/532 but had multiple problems with that (port conflicts on a single machine, legacy and efi hosts)

justin
2017-11-12 03:19
so now I literally have a CD in the drive that ipxe boots to DRP

shane
2017-11-12 03:20
proxy DHCP service isn't something we've enabled at the moment (as you are painfully aware) ... we've got it in the issues list, and we'll add it to the back log

justin
2017-11-12 03:20
but I need to read through docs on what to actually do with it. First I'm looking to see how to put unknown machines into a discovery mode (so I don't have to manually add them). Then I need to figure out how to actually provision a k8s cluster with it

shane
2017-11-12 03:20
we do support external DHCP - we just assume a full-featured DHCP server implementation - not the hobbled versions you'll find in (I think it was) wifi routers/etc.

shane
2017-11-12 03:21
I just added that documentation to the quickstart - for "latest" doc revision

justin
2017-11-12 03:21
right, I'm running now with `--disable-dhcp` but I needed a proxy (or in my case a CD) to do anything with it


shane
2017-11-12 03:21
make sure you're on "latest" version

justin
2017-11-12 03:21
I thought sledgehammer would autodiscover machines but I guess I was wrong. Reading through the docs now

shane
2017-11-12 03:22
the basics are to set the "prefs"

shane
2017-11-12 03:22
by default - we attempt to "do no harm" first and foremost

shane
2017-11-12 03:22
so you have to set the default stage/unknown bootenv/default bootenv ... so "discovery" will be enabled after those are set

shane
2017-11-12 03:28
once your Machines are discovered after setting Prefs - you can then add BootEnvs to them and reboot them to be installed. Advanced workflows and adding the IPMI plugin to do machine reboots from DRP can be done as well those require RackN registered account

shane
2017-11-12 03:29
you can also pre-add machines if you know MACs - by setting a reservation - using the "MAC" as the "Strategy", and the "Token" is the MAC address itself

justin
2017-11-12 03:30
`drpcli prefs set unknownBootEnv discovery defaultBootEnv sledgehammer defaultStage discovery` gave Error: POST: prefs: defaultStage: Stage discovery does not exist

shane
2017-11-12 03:30
what version of DRP are you running? ```drpcli info get```

shane
2017-11-12 03:31
did you do a `drpcli bootenvs uploadiso sledgehammer` ??

justin
2017-11-12 03:31
version v3.2.1-0-2ab654478528d1ee59781f7d53bc8f8b9c6853dd

justin
2017-11-12 03:31
I uploaded the iso in 3.2.0. Let me run it again to make sure it has the right image

shane
2017-11-12 03:32
have to rerun - it - as the sledgehammer image gets updated, and you'll need to make sure the content (v3.2.1) that requests sledgehammer matches the version of sledgehammer that is needed

shane
2017-11-12 03:32
are you using the UX ?

justin
2017-11-12 03:33
I logged into the web interface but mostly using cli

shane
2017-11-12 03:33
it's easy to check status of Stages - go to "stages" (oddly enough) - and make sure you have a check mark and not X next to the stage

shane
2017-11-12 03:33
crap

shane
2017-11-12 03:33
my fault

justin
2017-11-12 03:33
same error after uploading the sledgehammer iso

shane
2017-11-12 03:33
`defaultStage discover` (no 'y' on end)

shane
2017-11-12 03:35
hurriedly checks in doc patch ....

justin
2017-11-12 03:37
rebooting system to see if it does the right thing now

shane
2017-11-12 03:38
ok - fixed doc ...

justin
2017-11-12 03:41
k, it added the machine and now in stage discover. The system looks like it loaded sledgehammer and then rebooted to local disk. Is that expected?

justin
2017-11-12 03:41
I'm used to foreman which keeps the system in the discovery image until you decide to provision it

shane
2017-11-12 03:56
can you paste `drpcli prefs list` here ?

shane
2017-11-12 03:57
if `defaultBootEnv` is set to `local`, then that's what it'd do

shane
2017-11-12 03:58
uh - do `drpcli prefs list | grep -v Secret`

shane
2017-11-12 03:58
I don't wanna see your secrets

justin
2017-11-12 05:33
Sorry, got distracted with other things. Looking at this again now ```{ "debugBootEnv": "0", "debugDhcp": "0", "debugFrontend": "1", "debugPlugins": "0", "debugRenderer": "0", "defaultBootEnv": "sledgehammer", "defaultStage": "discover", "knownTokenTimeout": "3600", "unknownBootEnv": "discovery", "unknownTokenTimeout": "600" }```

justin
2017-11-12 05:33
I need to burn some more CDs so I can leave them in the drive. That way I don't have to go out in the garage

greg
2017-11-12 06:20
It should have stayed in sledgehammer @justin

greg
2017-11-12 06:21
Unless you have a workflow defined

justin
2017-11-12 06:46
nope, no workflows defined. I just redid my boot iso. Then going to look at the DPR settings again

justin
2017-11-12 08:04
Well I'm going to call it a night. I tried provisioning centos7 on one system but couldn't figure out the necessary steps. @shane I'm assuming you want to update discovery -> discover http://provision.readthedocs.io/en/stable/doc/operation.html#preference-setting

greg
2017-11-12 16:37
He did on the latest tree. Once we push the next release stable will update.

2017-11-12 18:34
HI all, I seem to be missing something. In the docs you claim 5mins to install on a R-Pi, but there are no ARM binaries?

shane
2017-11-12 18:37
hi @chriscowley - welcome ...

2017-11-12 18:38
hi @rackneng

shane
2017-11-12 18:38
Install can indeed happen in under 5mins - however, I believe our ARM builds were dropped due to lack of interest - if you have an ARM platform w/ Go 1.9 on it - you can pretty easily build from source

shane
2017-11-12 18:38
if there's enough interest in ARM platform - we definitely would add it back in to the builds

2017-11-12 18:38
Given the proliferation of R-Pis in the world, I think it would be cool - at least remove it from the docs :-)

shane
2017-11-12 18:39
can you point me to which doc specifically you're referring to ?

shane
2017-11-12 18:39
I'm working on some doc cleanup right now - I'll address that

shane
2017-11-12 18:39
(BTW - this is Shane, pleased to meet you)

2017-11-12 18:45
http://rebar.digital/#overview "Our extensible stand-alone DHCP/PXE/IPXE service has minimal overhead so it can be installed and provisioning in under 5 minutes on a laptop, RPi or switch"

shane
2017-11-12 18:48
Ah yes - the claim is still true, we just haven't released ARM binaries for a while. Thank you for pointing that out - I'm not hacking on those docs at the moment, but we'll get that cleaned up. It's possible that @greg can add cross-compile support for ARM via our existing build system - not sure off hand how hard that will be to add in though ...

shane
2017-11-12 18:49
In the meantime - I'm taking off for a motorcycle ride ... back in a bit ... :slightly_smiling_face:

greg
2017-11-12 18:51
The compilation is easy. The packaging isn?t too bad. The challenge is the target.

greg
2017-11-12 18:51
Finishing a Sunday. Thing.

shane
2017-11-12 20:01
@chriscowley - if we cut an ARM release for you - do you have a RPi you'll play with it on ??

2017-11-12 20:02
Odroid actually (which is ARM64)

shane
2017-11-12 20:07
if you intend to run w/ the TFTP services enabled for DRP - you'll need the bsdtar, p7zip, and unzip tools installed in your Linux OS - those are (currently) the only external dependencies we have.

2017-11-12 20:07
I know

shane
2017-11-12 20:07
We have plans to get away from them as external dependencies, but the Go Lang libraries are still lacking in ISO support features we need

shane
2017-11-12 20:07
Cool ... just wanted to highlight that. :slightly_smiling_face:

2017-11-12 20:10
@rackneng I can read the docs - even if I am apparently not capable of reading the architecture of the golang binary I download :-( (arm6l != arm64)

2017-11-12 20:11
I've probably got an R-Pi I can test on too

shane
2017-11-12 21:24
- before you get your Turkey fix ... we hope you plan on joining us for our 5th installment of the Digital Rebar Provision online meetup ... our primary discussion will be around the Runner and Jobs as they relate to stage transitions - this was a hot topic we touched on in the previous meetup - and we'll continue in more depth ... see the Meetup pages for more details, RSVP, and link to full agenda: https://www.meetup.com/digitalrebar/events/243490159/

shane
2017-11-12 21:25
Meetup is schedule for Tuesday November 21st at 11am PST ...

yusuf.hussein
2017-11-13 14:42
has joined #json

2017-11-13 15:03
hello

shane
2017-11-13 15:04
good morning, @hyusuf01 welcome

zehicle
2017-11-13 16:45
UX added a feature over the weekend... you can now rename your org and endpoints in your org information. This is handy if you have multiple endpoints and switch between them

zehicle
2017-11-13 16:46
on request, RackN can create orgs that are shared by multiple users.

yusuf.hussein
2017-11-13 16:49
ok . thank you

spencerj
2017-11-13 20:39
Can someone explain what the sledgehammer image is exactly? when I PXE a system with sledgehammer as the bootenv should I expect to see any kind of install screen or any output from BMC console?

shane
2017-11-13 20:40
hi @spencerj

spencerj
2017-11-13 20:40
Hey Shane! :slightly_smiling_face:

shane
2017-11-13 20:40
Sledgehammer is a live boot linux distro (based on Centos)

shane
2017-11-13 20:40
it only "live boots" - it does NOT install

shane
2017-11-13 20:40
we use it as a helper to perform workflow tasks (prep physical server for install, collect inventory, etc.)

shane
2017-11-13 20:41
it's primary purpose is to help implement an OS install - by "discovering" Machine info, and enabling more advanced OS install workflow scenarios

shane
2017-11-13 20:41
does that help understand it a bit more ?

spencerj
2017-11-13 20:42
okay cool! and yes it does! I'm probably doing something wrong because I just stood up DRP in "isolated" mode, setup my subnet with DHCP reservations and then PXE'd another system. watching the logs DRP got the request, found the reservation and issued the IP... but then nothing else happened.. the system failed out PXE.

shane
2017-11-13 20:43
that's a "safety mechanism"

spencerj
2017-11-13 20:43
ohhhhhh

shane
2017-11-13 20:43
we do not do any install unless you tell us to

spencerj
2017-11-13 20:43
even for sledgehammer?

spencerj
2017-11-13 20:43
that's the default bootenv I setup.

shane
2017-11-13 20:44
yes - can you copy the `drpcli prefs list | grep -v Secret` output here ?

spencerj
2017-11-13 20:45
``` { "debugBootEnv": "0", "debugDhcp": "0", "debugFrontend": "1", "debugPlugins": "0", "debugRenderer": "0", "defaultBootEnv": "sledgehammer", "defaultStage": "discover", "knownTokenTimeout": "3600", "unknownBootEnv": "discovery", "unknownTokenTimeout": "600" } ```

shane
2017-11-13 20:45
based on this - and assuming your subnet specification is right (along with required or optional Reservations configs)

shane
2017-11-13 20:45
you should boot in to Sledgehammer OS instance - and then stop

shane
2017-11-13 20:46
after this - you'd want to manually specify a BootEnv for OS install (eg `ubuntu-16.04-install`)

shane
2017-11-13 20:46
and reboot the Machine

shane
2017-11-13 20:46
... or ...

shane
2017-11-13 20:46
delve in to the world of our Workflow (stages) to automate the process

spencerj
2017-11-13 20:47
yeah.. after the DHCP response I saw something about a "file not found" or "couldn't download file" or something along those lines but then it failed too quickly for me to capture..

spencerj
2017-11-13 20:47
rebooting the node now to see if I can screenshot it.

shane
2017-11-13 20:48
does `drpcli bootenvs show sledgehammer | jq '.Available'` return "true" ?

spencerj
2017-11-13 20:49
yes

shane
2017-11-13 20:50
also make sure that you don't have an FW rules blocking ports 67, 69, 8091, and 8092 on the DRP Endpoint

spencerj
2017-11-13 20:50
the PXE error I get is "No boot filename received."

spencerj
2017-11-13 20:50
firewall is disabled completely along with SELinux.

shane
2017-11-13 20:51
are you using the built-in DHCP/TFTP - or external services ?

spencerj
2017-11-13 20:53
internal I assume.. I didn't set anything up external.

shane
2017-11-13 21:01
@spencerj do you have another DHCP server on the network ?

spencerj
2017-11-13 21:02
there shouldn't be.. I'm on a private VLAN.

shane
2017-11-13 21:02
does your DRP Endpoint have multiple NIC interfaces ?

shane
2017-11-13 21:02
a simple test would be to disable DRP on the host, then reboot your Machine to see if it gets a DHCP response

spencerj
2017-11-13 21:02
yes.. it's my "jumpnode" into the VLAN.. so it has a routable IP and private IP on separate interfaces.

shane
2017-11-13 21:03
ah - you may need to set the `--static-ip=` option to `dr-provision` to the correct network that you are trying to provision on

spencerj
2017-11-13 21:04
:flushed:

spencerj
2017-11-13 21:04
LOL

spencerj
2017-11-13 21:04
I was wondering about that when I ran the little "install" script.

spencerj
2017-11-13 21:04
I used the production IP because I figured that was needed to ensure the GUI was accessible.

spencerj
2017-11-13 21:05
I setup the subnet to listen on the private vlan though..

spencerj
2017-11-13 21:05
what's the easiest way to point the --static-ip option at the right address?

vlowther
2017-11-13 21:06
Try not specifying it first.

spencerj
2017-11-13 21:06
?? you mean just re-run the script?

vlowther
2017-11-13 21:07
When dr-provison ran, did it run wilt a --static-ip option set?

spencerj
2017-11-13 21:08
yes.. from the docs I ran this: `sudo ./dr-provision --static-ip=<production_ip> --base-root=/root/dr-test/drp-data --local-content="" --default-content="" &`

vlowther
2017-11-13 21:08
ok

shane
2017-11-13 21:08
FYI - the UX never directly talks to the DRP Endpoint (or the other way-round - endpoint never directly talks to UX)

vlowther
2017-11-13 21:08
Try deleting the --static-ip option

spencerj
2017-11-13 21:09
do I need to stop any services or anything before re-running?

vlowther
2017-11-13 21:11
Yeah, kill dr-provision first. :slightly_smiling_face:

spencerj
2017-11-13 21:12
okay! it seems to have run.. anyway to check values now?

spencerj
2017-11-13 21:13
or should I just try to PXE the machine again?

vlowther
2017-11-13 21:13
yes

spencerj
2017-11-13 21:17
same behavior: ``` dr-provision2017/11/13 22:16:43.501114 Received DHCP packet: type Discover xid 0x67cba53e ciaddr 0.0.0.0 yiaddr 0.0.0.0 giaddr 0.0.0.0 chaddr 00:1e:67:cb:a5:3e dr-provision2017/11/13 22:16:43.501992 Reservation for 10.0.0.3 has a lease, using it. dr-provision2017/11/13 22:16:43.504871 xid 0x67cba53e: Discovery handing out: 10.0.0.3 to 00:1e:67:cb:a5:3e via 10.0.0.10 ```

spencerj
2017-11-13 21:17
but the system says "No boot filename received".

vlowther
2017-11-13 21:17
ok

vlowther
2017-11-13 21:18
What is the system?

spencerj
2017-11-13 21:18
what do you mean? what is the hardware?

vlowther
2017-11-13 21:18
is it hardware , a VM, etc.

spencerj
2017-11-13 21:18
oh.. hardware. standard Intel server in a rack.

vlowther
2017-11-13 21:19
ok, cool.

vlowther
2017-11-13 21:19
hm.

vlowther
2017-11-13 21:20
Do you just have a reservation for that mac address, or is there a subnet definition as well?

spencerj
2017-11-13 21:20
I defined a subnet 10.0.0.1/16 and specified "Require DHCP Reservation".

spencerj
2017-11-13 21:20
and then a created the reservation using MAC for the sytem.

vlowther
2017-11-13 21:21
ok cool..

vlowther
2017-11-13 21:21
You have drpcli on the system?

spencerj
2017-11-13 21:21
yes

vlowther
2017-11-13 21:21
Just saw that in the backscroll

spencerj
2017-11-13 21:21
LOL.. no worries!

vlowther
2017-11-13 21:21
What does drpcli subnets list show?

spencerj
2017-11-13 21:22
``` [root@master dr-test]# drpcli subnets list [GIN] 2017/11/13 - 14:22:05 | 200 | 93.935055ms | 127.0.0.1 | GET /api/v3/subnets [ { "ActiveEnd": "10.0.0.5", "ActiveLeaseTime": 60, "ActiveStart": "10.0.0.2", "Name": "enp4s0f1", "NextServer": "10.0.0.10", "OnlyReservations": true, "Options": [ { "Code": 1, "Value": "255.255.0.0" }, { "Code": 28, "Value": "10.0.255.255" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "10.0.0.1/16" } ] ```

vlowther
2017-11-13 21:22
Well, that would do it. :slightly_smiling_face:

vlowther
2017-11-13 21:23
No PXE options there.

spencerj
2017-11-13 21:23
:flushed:

spencerj
2017-11-13 21:23
**facepalm**

spencerj
2017-11-13 21:24
how do I add that? I'm looking at the "Edit" page for the subnet.

vlowther
2017-11-13 21:25
Add an option, code=67, value=lpxelinux.0

vlowther
2017-11-13 21:25
Here is what mine liike like for reference:

vlowther
2017-11-13 21:25
[ { "ActiveEnd": "192.168.124.254", "ActiveLeaseTime": 60, "ActiveStart": "192.168.124.10", "Available": true, "Enabled": true, "Errors": [], "Name": "docker0", "NextServer": "192.168.124.11", "OnlyReservations": false, "Options": [ { "Code": 3, "Value": "192.168.124.11" }, { "Code": 6, "Value": "192.168.124.11" }, { "Code": 15, "Value": "http://example.com" }, { "Code": 67, "Value": "lpxelinux.0" }, { "Code": 1, "Value": "255.255.255.0" }, { "Code": 28, "Value": "192.168.124.255" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "", "Validated": true } ]

spencerj
2017-11-13 21:26
okay awesome! giving this a try now!

spencerj
2017-11-13 21:26
does this support iPXE and UEFI?

vlowther
2017-11-13 21:27
Not with that ilename, you will need something a little more complicated for that. :slightly_smiling_face:

spencerj
2017-11-13 21:28
okay I didn't think so. :slightly_smiling_face: We are using Cobbler right now and I'm evaluating DRP as a replacement.

spencerj
2017-11-13 21:29
we were able to get iPXE and UEFI to work in Cobbler with a little "wizardry" but it's not may favorite solution.

spencerj
2017-11-13 21:32
BINGO! that did it! thank you @vlowther

shane
2017-11-13 21:32
excellent !

shane
2017-11-13 21:32
@spencerj did you use any of the DRP Docs to get started ?

vlowther
2017-11-13 21:33
The neat thing is that the bootfile option is actually a template that can be expanded based on the contents of the DHCP packet.

vlowther
2017-11-13 21:33
I an trying to find our usual example -- been awhile since I used it.

greg
2017-11-13 21:34
bootfile

vlowther
2017-11-13 21:39
Of you set option 67 to `{{if (eq (index . 77) ?iPXE?) }}default.ipxe{{else if (eq (index . 93) ?0")}}lpxelinux.0{{else}}bootx64.efi{{end}}` it will ise iPXE if that has already been loaded, otherwise it will use lpxelinux.o on BIOS systems and elilo on UEFI systems.

shane
2017-11-13 21:39
(thank you @vlowther... for adding back ticks :slightly_smiling_face: )

vlowther
2017-11-13 21:40
Similarly, `{{if (eq (index . 77) ?iPXE?) }}default.ipxe{{else if (eq (index . 93) ?0")}}ipxe.pxe{{else}}ipxe.efi{{end}}` will force the use of ipxe for BIOS and UEFI systems.

spencerj
2017-11-13 21:40
:slightly_smiling_face:

vlowther
2017-11-13 21:41
So you can pick and choose what to use based on what has been validated with your gear,.

spencerj
2017-11-13 21:41
this looks pretty close to what we are using in Cobbler: ``` if exists user-class and option user-class = "gPXE" { filename "$_system_filename"; } else if exists user-class and option user-class = "iPXE" { filename "$_system_filename"; } else if exists pxe-system-type and option pxe-system-type != 00:00 { filename "ipxe.efi"; } else { filename "undionly.kpxe"; } ```

greg
2017-11-13 21:42
You could add in the gpxe template test as well.

vlowther
2017-11-13 21:42
Same idea, different mechanism

spencerj
2017-11-13 21:43
sure!

spencerj
2017-11-13 21:43
I think Cobbler just uses gPXE by default which is why it's there.

vlowther
2017-11-13 21:44
Until someone embeds a nice scripting language into their DHCP server, template expansion or weird config language hacks are the order of the day for this particular task. :slightly_smiling_face:

vlowther
2017-11-13 21:45
That little bind config snippet (and variants of it) have been around since before gpxe was forked to make ipxe.

spencerj
2017-11-13 21:45
LOL

vlowther
2017-11-13 21:45
Back a few product generations when we drove bind like that we did the same thing. :slightly_smiling_face:

spencerj
2017-11-13 21:46
okay so my system booted sledgehammer! yay! is there a way to see the system specs it gathered somewhere?

shane
2017-11-13 21:46
`drpcli machines list`

spencerj
2017-11-13 21:47
yeah I saw that. but it just had basic system info, name, description and UUID... does sledgehammer gather "Facts" like ansible? system specs? memory, cpu info etc...

shane
2017-11-13 21:47
also (if you haven't found it already) - you can append `--format=yaml` to view yaml -vs- json output

spencerj
2017-11-13 21:47
oh nice! that's helpful!

vlowther
2017-11-13 21:49
If you are running on tip you should have a Sledgehammer that has gohai

spencerj
2017-11-13 21:50
I think I ran with "stable": `v3.0.1-tip-20-93fd333f6046a4f49e58720647c31e9b1ed9bf07`

vlowther
2017-11-13 21:50
so part of the machine should be a huge blob containing an ever-growing list of what we consider to "interesting" hardware and basic config data

shane
2017-11-13 21:50
Um ... we hope not v3.0.1

shane
2017-11-13 21:50
that's ... ancient ...

vlowther
2017-11-13 21:50
That is a rather old stable. :slightly_smiling_face:

spencerj
2017-11-13 21:50
LOL

shane
2017-11-13 21:50
(which doesn't have inventory capabilities)


spencerj
2017-11-13 21:51

shane
2017-11-13 21:51
HIGHLY recommend you switch to "latest" for the docs

spencerj
2017-11-13 21:51
ohhhh.

spencerj
2017-11-13 21:51
:stuck_out_tongue:

shane
2017-11-13 21:52
we're about to revision the doc versions shortly - but it's lagging a bit behind ATM

shane
2017-11-13 21:52
you can check your DRP endpoint version with `drpcli info get`

spencerj
2017-11-13 21:53
gotcha! I just saw "stable" and well.. ya know.. it felt "safe"! :stuck_out_tongue_winking_eye:

shane
2017-11-13 21:53
you'd want to chuck an `--upgrade=true` on the end of all of that to do an upgrade (after you kill `dr-provision`)

spencerj
2017-11-13 21:53
ha ha.. I just ran `drpcli info get` and it said "unknown command "info""

vlowther
2017-11-13 21:53
too old for info. :slightly_smiling_face:

spencerj
2017-11-13 21:53
LOL

shane
2017-11-13 21:54
current stable is `v3.2.1-0-2ab654478528d1ee59781f7d53bc8f8b9c6853dd`

shane
2017-11-13 21:54
you can also shorten your `curl` a lot ....

shane
2017-11-13 21:55
`curl -s get.rebar.digital/stable | bash -s -- install --isolated --upgrade=true`

spencerj
2017-11-13 21:55
well I like that! :slightly_smiling_face:

shane
2017-11-13 21:55
hmm

shane
2017-11-13 21:56
though ... there are significant changes to "content"

shane
2017-11-13 21:56
it's really better if you wipe and restart - since updating the content pieces is a bit arduous process

shane
2017-11-13 21:57
I don't think we've done an in-house 3.0.1 to 3.2.1 direct upgrade - we do have upgrade steps listed in Doc - but ...

spencerj
2017-11-13 21:57
no worries!

spencerj
2017-11-13 21:57
I did an "isolated" install so I'll just wipe the dir and start over.

shane
2017-11-13 21:57
have you seen the UX for it yet ?

spencerj
2017-11-13 21:58

shane
2017-11-13 21:58
https://<your_drp_endpoint>:8092/

spencerj
2017-11-13 21:58
yes! I LOVE IT!!!!!

shane
2017-11-13 21:58
cool - @zehicle will get a warm fuzzy glow hearing that statement ...

zehicle
2017-11-13 21:59
yes, I did!

spencerj
2017-11-13 21:59
ha ha ha!

spencerj
2017-11-13 21:59
:thumbsup: :thumbsup:

zehicle
2017-11-13 22:00
if you are coming from v3.0 then it's a big jump

spencerj
2017-11-13 22:00
ohh.. I haven't seen the ux for 3.2

spencerj
2017-11-13 22:00
working on the install now.

shane
2017-11-13 22:01
the UX you've seen - is it "green" theme - or "blue" theme ?

spencerj
2017-11-13 22:02
blue theme:

shane
2017-11-13 22:02
ok - that's the new UX :slightly_smiling_face:


spencerj
2017-11-13 22:02
oh okay cool!

shane
2017-11-13 22:02
yep

spencerj
2017-11-13 22:02
:slightly_smiling_face:

spencerj
2017-11-13 22:02

spencerj
2017-11-13 22:03
is this going outside my network for anything?

shane
2017-11-13 22:03
your DRP Endpoint never reaches out (nor does the UX reach in to endpoint)

shane
2017-11-13 22:03
your browser is operating in CORS model - basically a "go-between" for the RackN Portal, and connecting to the DRP Endpoint

shane
2017-11-13 22:03
it's a single page React application you run in your browser

spencerj
2017-11-13 22:04
oh okay!

shane
2017-11-13 22:04
connection is `endpoint <-- browser --> rackn portal`

vlowther
2017-11-13 22:04
and that http://rackn.github.io is just where the app part loads from.

shane
2017-11-13 22:04
(had my arrows bass-ackwards - sorry)

shane
2017-11-13 22:05
exactly - there are some "content" Library pieces that rely on the RackN portal service - so you're browser will call out to our Portal for things like Contents, Plugins, etc...

shane
2017-11-13 22:06
Authentication is two-part - your Auth to your DRP Endpoint (that's the simple Auth w/ the default "rocketskates" username)

spencerj
2017-11-13 22:06
ohhh.. okay! :slightly_smiling_face:

shane
2017-11-13 22:06
and then the RackN (optional) Portal account for storing and managing your endpoint(s) information and managing them - and the contents you use across your infrastructure

shane
2017-11-13 22:07
again - that's Optional - as Endpoint management will work fine without the RackN Portal account - but you lose access to the advanced workflow management pieces w/out the Portal account

spencerj
2017-11-13 22:08
Are you guys collecting data/metrics/telemetry based on access to the Portal (even for non-RackN accounts)? I work for Intel so I gotta ask the "security" questions! :stuck_out_tongue:

shane
2017-11-13 22:08
we have some training slide decks that might be interesting for you: Feature Landscape: https://goo.gl/GYtwNS Installation: https://goo.gl/BoQG8J Configuration: https://goo.gl/BzJzTP Content Introduction: https://goo.gl/LChN6r Understanding Stages: https://goo.gl/iUjNNJ

spencerj
2017-11-13 22:12
awesome! I'll look all of this over.


spencerj
2017-11-13 22:16
not sure if this is a bug.. I was just trying to update the MAC for the reservation but it wont let me.

spencerj
2017-11-13 22:19
also... I ran the new install command: `curl -s get.rebar.digital/stable | bash -s -- install --isolated --upgrade=true` but after drpcli still shows v3.0.1

shane
2017-11-13 22:30
so - you must have an older version binary that's getting started up from a previous install ?

shane
2017-11-13 22:30
did you do a "production" mode install that put a `dr-provision` binary in `/usr/local/bin` ?

shane
2017-11-13 22:31
(and presumably, you're running `drpcli` on the same node as the `dr-provision` binary - the DRP Endpoint) ?

spencerj
2017-11-13 22:32
yes, drpcli is on the same node as dr-provision.. checking /usr/local/bin

shane
2017-11-13 22:32
in _isolated_ install mode - the`dr-provision` binary should be installed as: `bin/linux/amd64/dr-provision`

shane
2017-11-13 22:32
if you run that binary w/ `--version` flag, what does it spit out ?

shane
2017-11-13 22:33
```root@demo:~$ bin/linux/amd64/dr-provision --version dr-provision2017/11/13 22:32:05.195814 Version: v3.2.1-0-2ab654478528d1ee59781f7d53bc8f8b9c6853dd```

spencerj
2017-11-13 22:33
whoa... I guess I was on this same system back in May!! LOL.. whoops!

spencerj
2017-11-13 22:33
drpcli and dr-provision both in /usr/local/bin.. LOL

spencerj
2017-11-13 22:33
I guess I'll delete those and start over.

shane
2017-11-13 22:33
yeah - that would be what we call "production" install mode

shane
2017-11-13 22:34
I'd recommend wiping (or at least archiving off - if you were previously using them and worried about preserving) the following: ```/usr/local/bin/dr-provision /usr/local/bin/drpcli /var/lib/dr-provision/ /var/lib/tftpboot/```

shane
2017-11-13 22:35
those are the defautl paths that v3.0.1 used - so adjust accordingly if you installed in a different location

spencerj
2017-11-13 22:35
okay! all cleaned up!

shane
2017-11-13 22:36
also - there might be an ```/etc/systemd/system/dr-provision``` start up script that should be checked

shane
2017-11-13 22:36
(or other appropriate init script)

spencerj
2017-11-13 22:45
okay I think I'm back up! :slightly_smiling_face:

spencerj
2017-11-13 22:45
``` "version": "v3.2.1-0-2ab654478528d1ee59781f7d53bc8f8b9c6853dd" ```

shane
2017-11-13 22:45
Yay! Welcome to the modern Era !

spencerj
2017-11-13 22:46
LOL

greg
2017-11-13 22:46
lol


spencerj
2017-11-13 22:48
booting to sledgehammer but it looks like it's bombing out.

greg
2017-11-13 22:49
Make sure that your static-ip is not set (and if its not, you may need to specify it).

greg
2017-11-13 22:49
It appears that it is use the static-ip fall back

shane
2017-11-13 22:49
wasn't your DRP endpoint 10.0.0.10 ?

spencerj
2017-11-13 22:49
LOL.. yeah it's not set, or at least I didn't specify one for the initial dr-provision command.

greg
2017-11-13 22:50
192.168.124.11 is our default thing

spencerj
2017-11-13 22:50
the DRP host has 2 NICs.. one routeable and one private.

shane
2017-11-13 22:50
`ps -ef | grep dr-provision | grep -v grep` (plz)

spencerj
2017-11-13 22:50
I only want to listen on the private interface. but I want to access UX on routable.

shane
2017-11-13 22:51
the DRP Endpoint doesn't need access to public internet/RackN Portal for UX

shane
2017-11-13 22:51
you just have to be able to reach the DRP Endpoint from your laptop/desktop/whatever

spencerj
2017-11-13 22:51
``` root 15445 13994 3 15:38 pts/0 00:00:27 ./dr-provision --base-root=/root/dr-test/drp-data --local-content= --default-content= ```

greg
2017-11-13 22:52
so, we try and guess, but somethings we get it wrong. The `static-ip` is the fallback if we can?t guess the outbound interface.

greg
2017-11-13 22:52
Sooo - `--static-ip=10.0.0.10`

spencerj
2017-11-13 22:52
oh okay!

spencerj
2017-11-13 22:52
got it!

greg
2017-11-13 22:53
We may want to re-evaluate that code. It seems like it may not be working correctly.

spencerj
2017-11-13 22:54
so to be clear.. the static-ip should be the interface targeted for DHCP/PXE traffic?

shane
2017-11-13 22:54
yep

spencerj
2017-11-13 22:54
okay cool!

zehicle
2017-11-13 23:00
@spencerj we collect the "drpcli info get" information about endpoints that connect to the UX. That's what we use to determine the features that can be enabled and if there are any version warnings.

spencerj
2017-11-13 23:00
oh okay cool!

zehicle
2017-11-13 23:00
we don't store any passwords, content or other data about the endpoint.

spencerj
2017-11-13 23:01
sweet thanks!

spencerj
2017-11-13 23:01
so now that I'm running the latest stuff on 3.2. Is there a way to see the info that sledgehammer collected? does it collect "facts" like ansible?

shane
2017-11-13 23:02
`drpcli machines list`

shane
2017-11-13 23:03
or, get a list of Machine names: `drpcli machines list | jq '.[].Name'` `"snoopy"` and then show a specific machine: `drpcli machines show snoopy`

shane
2017-11-13 23:03
the Machines menu entry in the UX also shows the inventory information

shane
2017-11-13 23:07
to dump *just* the inventory for a given machine: `drpcli machines list | jq -r '.[] | "\(.Name) \(.Uuid)"'` `snoopy 80b86604-be25-4f27-ba0b-f8382db42b96` then use the UUID to get the inventory for the given machine: `drpcli machines get 80b86604-be25-4f27-ba0b-f8382db42b96 param gohai-inventory`

zehicle
2017-11-13 23:09
@spencerj RE inventory... there are params on the machine AND params from the profiles that are on the machine (including global by default). So the "inventory" per machine merges both together when it expands params in templates

spencerj
2017-11-13 23:18
awesome thanks!

spencerj
2017-11-13 23:19
now who can tell me about the RAID/BIOS capabilities? what does this mean? can DRP automate RAID configuration?

spencerj
2017-11-13 23:20
specifically I'm asking about HW raid controllers like LSI.

shane
2017-11-13 23:21
yes - we can, however - it's not yet 100% baked - @vlowther has been working on the RAID capabilities - porting the MegaRAID tools (which support LSI controllers) from our older DRv2 product to our current DRPv3

spencerj
2017-11-13 23:22
okay awesome!

shane
2017-11-13 23:22
same story - but I believe the porting is a lot further behind ... for BIOS/Firmware capabilities

shane
2017-11-13 23:22
note that RAID/BIOS stuff are "premium" features that are paid content pieces w/ RackN - not part of the OpenSource provisioning pieces

spencerj
2017-11-13 23:24
okay cool! and yeah.. I saw that on the RackN pricing page. so what exactly do we get for the "$1 per server per month" ? obviously not the RAID/BIOS stuff, but what are the main added features with the paid plan?

spencerj
2017-11-13 23:24
I think you, or someone mentioned "Control Workflow" earlier? is that not included with DRP?

zehicle
2017-11-13 23:26
that base is for RackN support of the open source.

zehicle
2017-11-13 23:26
the control workflow was moved into the open source for v3.2

spencerj
2017-11-13 23:27
OH.. SWEET!

zehicle
2017-11-13 23:28
you are right, RAID/BIOS, metal IPMI, direct to disk imaging, licensed O/Ses, etc are ala cart pricing

spencerj
2017-11-13 23:29
okay cool! thank you!

spencerj
2017-11-13 23:30
and thank you to everyone who's chimed in today! SUPER helpful!

shane
2017-11-13 23:41
no problem ! let us know if you run into any other issues ...

shane
2017-11-14 20:04
( @chriscowley ) - we now have an arm64 Linux build in our "tip" version... NOTE - this is extremely minimally tested (i.e. I chucked it on an arm64 centos7 platform in http://packet.net ... and it worked, but YMMV) - please treat it like "alpha" feature. To install - use "tip" version, like: `curl -s get.rebar.digital/tip | bash -s -- install --isolated --drp-version=tip`

2017-11-15 07:51
@rackneng I'll try it on oDroid C2 when I get a chance. The Rpi is not ARM64 though - it is ARMv7 :-)

2017-11-15 07:51
While ARM64 sits me, I think the wider community would benefit more from an ARMv7 build

shane
2017-11-15 15:21
@chriscowley - what does `uname - m` on those v7 platforms return?

yusuf.hussein
2017-11-15 18:10
hello

yusuf.hussein
2017-11-15 18:10
can i assign a static ip for guest vm

shane
2017-11-15 18:20
@yusuf.hussein - yes, you would use a "Reservation" to do that - but you do need to know "some info" about your Guest VM to assign an IP to it

shane
2017-11-15 18:20
usually that's the MAC address

yusuf.hussein
2017-11-15 18:23
for one server we can have mac address

yusuf.hussein
2017-11-15 18:23
what if for more than 100

shane
2017-11-15 18:25
the (minimal) UX doc we have is: http://provision.readthedocs.io/en/latest/doc/ui.html#reservations or - via the `drpcli` command line, you could do: ```echo '{ "Addr": "1.2.3.4", "Available": true, "NextServer": "1.2.3.10", "Options": [], "ReadOnly": false, "Strategy": "MAC", "Token": "00:0c:3f:f1:13:d3" }' > my_reservation.json drpcli reservations create -< my_reservation.json```

shane
2017-11-15 18:25
you could allow the machines to be discovered automatically by DRP - then convert an existing Lease to a static reservation - if you don't want to collect the MAC addrs of all of your Machines

yusuf.hussein
2017-11-15 18:38
ok thank you

yusuf.hussein
2017-11-15 18:38
let me try

yusuf.hussein
2017-11-15 22:57
is it going to be any impact in our excising DHCP server if we add these parameter

yusuf.hussein
2017-11-15 22:57
set next-server 192.168.19.79 ? Rackn server set filename " lpxelinux.0"

zehicle
2017-11-16 01:27
@yusuf.hussein the system can also be set to reserve the ip after assignments. So it keeps getting the same address after the first dhcp

zehicle
2017-11-16 01:29
Oh, @shane said the sand thing earlier

mprzyjazny
2017-11-16 04:57
has joined #json

2017-11-16 08:02
@rackneng `uname -m` returns `armv7l`on a R-Pi and `aarch64` on an oDroid

2017-11-16 08:03
@rackneng (I reckon a Pine64 will also return `aarch64`)

2017-11-16 08:53
This may be in the docs, but I haven't seen it. Are you capable of deploying a Windows client OS?

greg
2017-11-16 12:26
Not in community. You would need to work with RackN. The problem is that we are still building up the patterns for that. Also it depends upon your starting windows position.

shane
2017-11-16 14:07
@chriscowley - thx for the uname's

yusuf.hussein
2017-11-16 15:02
@yusuf.hussein pinned a message to this channel.

dongluo.chen
2017-11-16 20:01
has joined #json

lae
2017-11-16 21:31
hmm...something in a recent update is causing our builds to fail here... ``` + /usr/local/bin/drpcli machines processjobs e6de1551-4be6-4b9c-b4bb-d960b39a2421 Segmentation fault ```

shane
2017-11-16 21:33
@lae are you using "tip"? what version?

shane
2017-11-16 21:33
can you do a `strings` on the drpcli binary ?

lae
2017-11-16 21:34
iteration is taking a while

shane
2017-11-16 21:34
sorry - don't do "strings" :slightly_smiling_face:

lae
2017-11-16 21:34
oh wait right it's on the drp server

lae
2017-11-16 21:34
hold on

shane
2017-11-16 21:34
I meant `file` to make sure binary architecture is right

lae
2017-11-16 21:34
so it's too late for me to get the original version that segfaulted

lae
2017-11-16 21:35
583d0f24e1fd02140603ce096421467f /var/lib/dr-provision/tftpboot/files/drpcli.amd64.linux but this was the md5sum

lae
2017-11-16 21:35
I updated to current tip and the segfault still occurs

lae
2017-11-16 21:36
and also yeah I checked that the arch was right: ``` + uname -a Linux (none) 4.1.15 #1 SMP Sun Aug 6 23:01:53 PDT 2017 x86_64 GNU/Linux ```

lae
2017-11-16 21:36
Right now I'm trying a provision with 3.2.1's binary

lae
2017-11-16 21:39
3.2.1 also segfaulted (trying 3.1.0 now)

lae
2017-11-16 21:40
I think the one that was previously working with this bootenv/stage was tip sometime after 3.1.0 :v

shane
2017-11-16 21:41
ok - I just deployed `tip` version on x86_64 linux (centos 7) - no problems

shane
2017-11-16 21:42
```root@demo:~/foo$ uname -a Linux 5min-drp-ewr1-00 3.10.0-327.22.2.el7.x86_64 #1 SMP Thu Jun 23 17:05:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux root@demo:~/foo$ ./drpcli version Version: v3.2.1-tip-41-6894ee85c5c018192ba9ce9b7378fd0fece724d7```

lae
2017-11-16 21:42
yeah centos 7 worked fine for me (used it for deploying atomic yesterday)

lae
2017-11-16 21:43
this is an in-house distro

shane
2017-11-16 21:43
cool

lae
2017-11-16 21:43
hmm

lae
2017-11-16 21:43
it still segfaulted

shane
2017-11-16 21:43
I did add some armv8 (aarm64) and armv7 (armv7l) architecture builds within the last few days - that includes modifications to the "install.sh" script to stick the "right" binaries in place for a "deployment" install

lae
2017-11-16 21:44
can something wrong with the tasks cause the segfault, you think?

shane
2017-11-16 21:44
I wouldn't think tasks or plugins would cause `drpcli` issues - maybe `dr-provision` binary ...

shane
2017-11-16 21:45
are you running the binary via PATH locating - or directly via fully qualified path ?

lae
2017-11-16 21:45
dr-provision?

lae
2017-11-16 21:45
or drpcli

shane
2017-11-16 21:45
it's drpcli you said that segfaulted, right ?

lae
2017-11-16 21:45
correct

lae
2017-11-16 21:46
`/usr/local/bin/drpcli machines processjobs "e6de1551-4be6-4b9c-b4bb-d960b39a2421"`

shane
2017-11-16 21:48
ok - so this is occurring on a Machine you're trying to provision during a stage, right ?

lae
2017-11-16 21:49
correct

shane
2017-11-16 21:49
are you able to manually run `drpcli` on the machine - say, just w/ the "version" flag ?

shane
2017-11-16 21:49
does it segfault then ?

shane
2017-11-16 21:50
or any other non "processjobs" actions

lae
2017-11-16 21:50
not interactively, I can add it to the template though, hold on

lae
2017-11-16 21:53
well, I added it but I also had redeployed dr-provision 3.2.1

lae
2017-11-16 21:53
so we'll see

shane
2017-11-16 21:53
ok

lae
2017-11-16 21:57
still segfaults

lae
2017-11-16 21:57
with version

shane
2017-11-16 21:58
you have access to the Machine ?

vlowther
2017-11-16 21:58
hm. Any stacktraces?

shane
2017-11-16 21:58
yeah - was hoping you could try and capture one off of the machine

lae
2017-11-16 21:58
access as in?

shane
2017-11-16 21:59
ssh

shane
2017-11-16 21:59
console

lae
2017-11-16 21:59
technically I'm attached to console, though it'll be a bit more effort to bring up an interactive one within this image

lae
2017-11-16 22:00
hold on lemme try something

lae
2017-11-16 22:01
the image is also probably going to be really limited though, I don't think strace will be in it

vlowther
2017-11-16 22:04
If it is not spitting out a stacktrace on the console, check dmesg to see if it has anything weird.

vlowther
2017-11-16 22:05
also, is it happening on just one machine or on more than one?

lae
2017-11-16 22:14
I need to make a separate bootenv/stage - will report soon

vlowther
2017-11-16 22:25
Also, if you have a core dump, I suppose now is as good a time as ever to learn how gdb and go binaries interact. :slightly_smiling_face:

vlowther
2017-11-16 22:25
And by now, I of course mean after I get home. :slightly_smiling_face:

lae
2017-11-16 22:39
alright

lae
2017-11-16 22:40
I should have paid more heed to this particular message:

lae
2017-11-16 22:40
``` # wget -O drpcli "$ProvURL/files/drpcli.amd64.linux" Connecting to 10.11.110.50:8091 (10.11.110.50:8091) wget: short write ```

lae
2017-11-16 22:41
``` # mount -t tmpfs -o size=1g tmpfs /tmp # cd /tmp # wget -O drpcli "$ProvURL/files/drpcli.amd64.linux" Connecting to 10.11.110.50:8091 (10.11.110.50:8091) drpcli 100% |*******************************| 20885k 0:00:00 ETA # md5sum drpcli 3200370360a384e28bd3ca3a54d2e5e8 drpcli # chmod +x drpcli # ./drpcli version Version: v3.2.1-0-2ab654478528d1ee59781f7d53bc8f8b9c6853dd ```

lae
2017-11-16 22:41
so basically / filled up when fetching drpcli

shane
2017-11-16 22:43
oye !

shane
2017-11-16 22:44
jeesh - drpcli is a pretty slim binary !!

lae
2017-11-16 22:45
there's too much code I didn't write to actually look through, but this is the first time I was running this bootenv (fireeye's appliance OS's manufacturing images) on this particular model of appliance, and I know that the manufacturing image does check for model and stuff so it might have, idk, made / a smaller fs than for other appliances that I know it's worked on

lae
2017-11-16 22:46
I actually had to make /tmp a tmpfs in order to run processjobs for other appliances anyway because of the code it pulled, might as well just drop drpcli there instead of /usr/local/bin

lae
2017-11-16 22:46
(due to lack of space)

lae
2017-11-16 22:47
time to upgrade drp back to tip

shane
2017-11-16 22:48
cool - glad it was a simple environment issue !

ekrueger
2017-11-16 23:56
has joined #json

lae
2017-11-17 00:14
so one thing i noticed but haven't had much chance to look into is that if the bootenv/stage is set to sledgehammer when a Machine boots up then it doesn't install SSH keys

lae
2017-11-17 00:15
I think that step might be missing on the sledgehammer stages specifically? (something about it being in control.sh in the discovery bootenv but that doesn't get used by sledgehammer last I looked)

shane
2017-11-17 01:10
@ekrueger welcome

shane
2017-11-17 01:10
ssh keys are not installed until after OS (bootenv) is installed - we do have an option to inject SSH keys in to Sledgehammer live boot - but it doesn't do that by default

shane
2017-11-17 01:10
@lae ^^

shane
2017-11-17 03:55
@chriscowley - I have an armv7 build done - it's not pushed for deployment yet - but if you have hardware you want to play with it on - let me know

greg
2017-11-17 20:53
@justin and @yusuf.hussein - this one is for you and all the others in the . :slightly_smiling_face: Tip has been updated to have DHCP Proxy support.

shane
2017-11-17 20:55
woot !

greg
2017-11-17 20:55
You can now build a subnet configuration (mostly so we don?t do it for everything) and set the Proxy flag to true. I don?t have a UI part for this, yet. You will need to use the CLI. UX coming.

greg
2017-11-17 20:56
`drpcli subnets update mysubnet '{ "Proxy": true }'`

greg
2017-11-17 20:57
This will turn the subnet into something sends pxe client proxy DHCP messages to hosts to send boot information. The important options (though others can be sent) are the bootfile (67) and the nextserver (DRP Endpoint IP).

greg
2017-11-17 20:57
I?ve tested this in virtualbox. I?d be interested in how this works for y?all.

shane
2017-11-17 20:58
and for @chriscowley - we have arm64 (v8) and arm_v7 (32 bit - armv7l) builds in the release

greg
2017-11-17 20:58
oh - yeah - that is in tip as well. :slightly_smiling_face:

shane
2017-11-17 20:59
So ... RaspberryPI and ODroid on, my friends !! (and anything else ARMy)

greg
2017-11-17 21:11
UX can now set proxy as well.

yusuf.hussein
2017-11-17 21:31
Thanks greg

yusuf.hussein
2017-11-17 22:38
i am getting error when i am turning on proxy

yusuf.hussein
2017-11-17 22:38
[root@rackn dr-provision-install]# ./drpcli subnets update mysubnet '{ "Proxy": true }' Error: GET: subnets/mysubnet: Not Found

yusuf.hussein
2017-11-17 22:38
not sure if am doing correct

shane
2017-11-17 22:39
"mysubnet" refers to a subnet you create in a previous step

shane
2017-11-17 22:39
your subnet name will likely be different

yusuf.hussein
2017-11-17 22:41
thanks shane

yusuf.hussein
2017-11-17 22:41
my fault

yusuf.hussein
2017-11-17 22:41
it works

shane
2017-11-17 22:42
`drpcli subnets list | jq '.[].Name'`

shane
2017-11-17 22:42
that gives the name of all subnets on your DRP endpoint

yusuf.hussein
2017-11-17 22:54
what is jq ?

vlowther
2017-11-17 22:59
jq is a JSON swiss army knife.

vlowther
2017-11-17 22:59

vlowther
2017-11-17 23:00
We use it basically everywhere we need to mess with JSON on the commandline

yusuf.hussein
2017-11-17 23:07
thank you

wdennis
2017-11-20 02:21
3-

shane
2017-11-20 02:21
7*

zehicle
2017-11-20 03:27
21-

wdennis
2017-11-20 03:30
New math :stuck_out_tongue_winking_eye:

shane
2017-11-20 03:46
Really? @zehicle you had to edit your answer? :face_with_rolling_eyes:

zehicle
2017-11-20 03:46
I'm rusty on my reverse polish

shane
2017-11-20 03:46
lol

zehicle
2017-11-20 03:47
although, my original answer is what would have been the output

wdennis
2017-11-20 14:08
Hi gang :wave: Can we talk about imaging to disk as an OS deployment method?

greg
2017-11-20 14:26
We can.

wdennis
2017-11-20 14:59
So, we have an OS image deployment process that goes like this:

wdennis
2017-11-20 15:01
1) Make a ?gold-master? OS install image on a small HDD via manual config/Ansible, and then saving an image via Clonezilla (FOSS image capture/restore program)

wdennis
2017-11-20 15:01
2) Image the target system disk (almost always much bigger than the image) with aforementioned gold master image

wdennis
2017-11-20 15:02
3) ^^^ is done via a ?LiveCD? ram OS, usually machine booted via a USB key, but also could be PXE - Then after the image is laid down on the machine?s internal HDD, must manually fire up GParted, and resize the partitions to a) create appropriate swap part/n, and b) resize other existing partition to appropriately fill the target disk

wdennis
2017-11-20 15:03
4) Once that is all done, run Ansible from the Live CD (pulling down playbooks from Git) and run them against the target disk which is mounted chroot (which is interesting, as some Ansible ?facts? relate to the in-mem OS, not the target disk OS? So we have to set/use custom facts)

wdennis
2017-11-20 15:04
So, do you folks have a better way to deploy an image to a different size target disk? Or do you assume the target disk is same size as image disk?

ctrees
2017-11-20 16:29
@wdennis so why not PXE a 'release' image, then just ansible it via workflow ?

ctrees
2017-11-20 16:31
I ask as I seem to be messing with the same workflow as you. In my case, I'm attempting to match the Universities 'training of students' to some governmental 'service' companies

greg
2017-11-20 16:31
@wdennis - I?ll try and get back to this as well. On call for a while.

shane
2017-11-20 16:32
@ctrees one big reason not to do it that way - if you do installs based on PKG systems - you must host and carefully control every single package in your own hosted repo. Otherwise, you end up with unknown version numbers and releases of pkgs installed on the Machine.

shane
2017-11-20 16:32
in some environments - especially with large scale systems (1000s to 10000s) - you can NOT afford to be uncertain about an OS deployment with various versions of PKGs installed - and subtle bugs/interactions with your application and services

ctrees
2017-11-20 16:35
Yea, I figured so (which is why I'm looking into this also) but unless you normalize H/W (which is almost impossible now-days)... you still end up with a blizzard of snowflakes :wink:

ctrees
2017-11-20 16:39
I'm debating on where to cut the boundaries for 'the next major update' for these governmental systems but make sure the University grads can service the infra... but what @wdennis describes seems to be what the 'current pattern' is for the companies... storage seems to be the PIA

shane
2017-11-20 16:43
the Image deployment pattern is a major component of Immutable Infrastructure - which is a pattern popularized (not invented) in Cloud - with instantiating a VM - use it, update your VM images - then blow away VM, and re-instantiate it ... sort of workflow

shane
2017-11-20 16:44
there have always been similar Image (both raw image and filesystem image) capabilities in several deployment tools for decades that follow the "Gold Image" pattern and Immutable Infrastructure - what you do **after** the initial install is what dictates whether you're following those principles or not

shane
2017-11-20 16:44
if you PKG upgrade things in place after install - you are NOT following Immutable Infrastructure patterns

shane
2017-11-20 16:44
if you nuke a machine and redeploy when you need to update - you are

shane
2017-11-20 16:45
how you get to a common set of OS, supporting pkgs, and apps - can vary - but the goal is to guarantee WHAT is installed (precisely), along with how you operate that infra after the fact

wdennis
2017-11-20 17:59
@shane @ctrees I wouldn?t call what we are doing ?immutable? - it?s just a OS install acceleration mechanism in our case

wdennis
2017-11-20 18:00
?Immutable? to me means ?read-only?

wdennis
2017-11-20 18:01
Maybe with COW technology, one could deploy an OS image which would be immutable, but then changes, OS state persisted to files, etc would go into a writeable layer on top

zehicle
2017-11-20 18:02
to me "immutable" means only initial configuration. No patch/upgrade. so, it's read only after the initial configuration.

wdennis
2017-11-20 18:02
Not sure if that?s a thing on bare metal

zehicle
2017-11-20 18:03
even cloud immutable & containers get initial configuration before they start taking workload. once they are running, there's nothing in the system that cannot be thrown away

wdennis
2017-11-20 18:03
@zehicle Enforced read-only, or by convention?

zehicle
2017-11-20 18:03
by convention

zehicle
2017-11-20 18:03
and the fact that the systems can be destroyed at any time

wdennis
2017-11-20 18:04
Yes, ok

wdennis
2017-11-20 18:04
Not like a memory-booted system then, but actual bits on disk?

zehicle
2017-11-20 18:05
either way. same effect.

zehicle
2017-11-20 18:05
I did a post about this.... looking it uyp


wdennis
2017-11-20 18:07
Gentlemen?s agreement that local machine state past initial boot/provisioning is ephemeral, right?

wdennis
2017-11-20 18:08
I see the utility in that

shane
2017-11-20 18:12
Immutable Infrastructure as applied to OS and App deployment (generally) separates out the state in 3 layers: 1. OS deployment guaranteed to be the same across all deployments/machines 2. OS state is separate (this includes configuration elements to make a VM or Machine "operate" correctly) 3. application state separated out from deployment/provisioning activities Usually - the "application state" part is via "cloudy" based services (eg highly replicated technologies) - such that any given Application instance can be destroyed, and the state can "carry on" beyond the death of the individual VM/Machine

shane
2017-11-20 18:12
you can consider these "layers", but they don't have to be enforced in a layered filesystem model - but Containers very much adhere to this principle with layers

wdennis
2017-11-20 18:17
We actually have another cluster here wherein the base OS comes in via PXE, and runs memory-resident, but the nodes also have disk in them to persist certain state

greg
2017-11-20 18:52
Well immutable infrastructure not withstanding. I?ve started looking at your steps, @wdennis. I think those are possible and I?ve started that process. With regard to immutable or not. I think DRP should enable all the insanity possible, but encourage best practices. Soooo. all 4 steps are actually allowed and done in some of the experimental imaging stuff I?ve been doing.

ctrees
2017-11-20 18:56
... I'll sign up to build, use, document and rebuild the path blazed by @greg and @wdennis :wink: I've been coming up to speed on sphinx

shane
2017-11-20 18:57
Sweet. I've updated the document on documentation in "latest"

ctrees
2017-11-20 19:02
ok... I think I follow the doc patterns... I used your 5min demo for both doc gray and mailservices, they had a few general questions like 'why sphinx' and why not just md (as kubespay seems to use)... then github has pages.github.com.... (they really don't care that much other than 'choice motivation'

wdennis
2017-11-20 20:04
@greg The sticking point with our imaging process is the manual resizing of partitions after imaging the target disk (as well as creating a swap part?n based on RAM size) - love to hear any ideas for automating that

greg
2017-11-20 20:08
Okay - soooo - I?ve started working on this by using the CoreOS ignition tool with some helpers.

greg
2017-11-20 20:08
It also depends upon what imaging technique you are using.

greg
2017-11-20 20:09
well - backing up.

greg
2017-11-20 20:09
First, This can be done as a task in sledgehammer.

wdennis
2017-11-20 20:12
We use ?Clonezilla? live via PXE-boot ( http://clonezilla.org )

wdennis
2017-11-20 20:20
@greg Is CoreOS Ignition usable as a post-imaging tool? Looks to me from what I?m reading as a pre-install prep tool, as well as an installer...

greg
2017-11-20 20:27
I?ve added imaging capabilities to it.

greg
2017-11-20 20:27
so, you can do pre/post and install.

greg
2017-11-20 20:27
clonezilla looks interesting.

greg
2017-11-20 20:28
An interesting thing is that you could build a clonezilla bootenv with custom templates to restore and/or backup systems.

wdennis
2017-11-20 20:54
Oh sure

2017-11-20 22:28
Can I change the name of a discovered machine?

greg
2017-11-20 22:30
Yes - only through the cli and API (not the UX) currently.

greg
2017-11-20 22:30
`drpcli machines update <uuid> '{ "Name": "newname" }'`

2017-11-20 22:31
Ah thanks

zehicle
2017-11-21 04:51
@ctrees we did use markdown for a while - switching to RST and Sphinx lets us use readthedocs and have all the awesome xref, index and PDF features that come with treating the docs like a book.

wdennis
2017-11-21 13:35
Did not notice an announcement that there?s a v3.3 (new stable) out now?

ctrees
2017-11-21 13:39
I think the community meetup is still on for ?? Wed ??

wdennis
2017-11-21 13:41
idk

greg
2017-11-21 13:41
Today. Sorry about that. Finished it yesterday but had to take care of some family things. Will discuss it today


greg
2017-11-21 13:41
Every other Tuesday

wdennis
2017-11-21 13:42
thx

wdennis
2017-11-21 13:42
Anyways, no problem, just wondered if I missed the announcement?

greg
2017-11-21 13:43
The short of it is. Small release Minor bump is because I tweaked the api in subnets. Addition of field that defaults to good value.

wdennis
2017-11-21 13:44
I see from commits that Swagger is going away soon?

greg
2017-11-21 13:45
The cli generate from swagger. It is a pain. Bloated and wrong. We will talk about that to.

wdennis
2017-11-21 13:45
OK

greg
2017-11-21 13:45
Too

wdennis
2017-11-21 17:09
Could I get some help configuring the IPMI plugin?

shane
2017-11-21 17:57
In just about an hour, we'll be hosting the v005 Digital Rebar online meetup - lots to discuss around new features, the new v3.3 release, content use case, documentation, etc... check out Agenda items, along with links to the meetup zoom URL, at: https://docs.google.com/document/d/1EDme5I05Sxwe111iluQDa1E-OiLY0xkKTCEn7bQIvfA


shane
2017-11-21 17:58
^^^^^^


zehicle
2017-11-21 19:17
a very very simple version of the runner

carl
2017-11-21 20:02
A discussion point for next week unless it's already been covered: UEFI support (it's sadly becoming a problem for me, and will be required for most commercial ARM platforms)

shane
2017-11-21 20:02
we have discussed it very briefly - but have not focused on it much

shane
2017-11-21 20:03
UEFI PXE options

shane
2017-11-21 20:04
In your `subnets` specification - you'd set DHCP options similarly. We also support the use of the Golang Template language within the subnets/DHCP options to do interesting things - namely change which PXE file is provided based on the options we receive from a Machine

shane
2017-11-21 20:04
around the Nov13th time frame there was discussion here in #community you might want to review and see if that helps ?

shane
2017-11-21 20:15
Also discussed in today's meetup was the Runner/Workflow system - here's the drawing:

shane
2017-11-21 20:16
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F83QJ667M/digital_rebar_runner_workflow.pdf and commented: From 2017/11/21 DRP v005 meetup discussion.

carl
2017-11-21 20:20
Good to know. Thanks!

wdennis
2017-11-21 20:25
@shane Could I get some help configuring the IPMI plugin? (are there any docs on this?)

shane
2017-11-21 20:27
I don't think there are docs

wdennis
2017-11-21 20:28
I installed the plugin, "activated" it, and have set a param on it for password

wdennis
2017-11-21 20:29
Not sure what else might need to be done, but big issue is how to apply it to the machines...

shane
2017-11-21 20:29
which param

wdennis
2017-11-21 20:29
`ipmi/configure/password`

shane
2017-11-21 20:30
I haven't played w/ the IPMI plugin much lately - but I'm guessing you also have to set a username and machine IP address

shane
2017-11-21 20:30
so you'd have to create either a set of Params you apply to a machine - or a profile which has those settings - per machine

wdennis
2017-11-21 20:31
There's only one entry in Systems > Plugins, correct?

wdennis
2017-11-21 20:34
Corresponding to this from drpcli: ``` [dradmin@dr-admin ~]$ drpcli plugins list [ { "Available": true, "Errors": [], "Name": "ipmi", "Params": { "ipmi/configure/password": "xxxxxxx" }, "PluginErrors": [], "Provider": "ipmi", "ReadOnly": false, "Validated": true } ] ```

shane
2017-11-21 20:35
:slightly_smiling_face:

shane
2017-11-21 20:35
other params: `ipmi/configure/username`

shane
2017-11-21 20:35
`ipmi/address`

wdennis
2017-11-21 20:35
If they are all the same across my hosts, I can set the param values in the plugin, correct?

shane
2017-11-21 20:37
ah - so the ones w/ `configure` in them are for bootenv install time to SET the username/password parameters (etc.)

shane
2017-11-21 20:38
are you just looking to add power management controls ?

wdennis
2017-11-21 20:38
Yes, power cycle, next boot pxe, etc

wdennis
2017-11-21 20:42
I don't see how to "enable" a host for IPMI (and get the power controls in the UX for the host)

wdennis
2017-11-21 20:55
@shane Any idea?

shane
2017-11-21 20:58
discussing it - I haven't played with this plugin - my work to date has only been in packet - which is very different enabling it

shane
2017-11-21 20:58
I am very familiar w/ BMCs and IPMI management - just checking out how to do this properly w/in the product right now

wdennis
2017-11-21 20:58
Cool, thx

shane
2017-11-21 21:00
sadly - I don't have any hardware to really test this against ... so reading code, and what not

wdennis
2017-11-21 21:03
I'm using `ipmitool` outboard right now to kick the machines, like to control them from the UX (a la Rob's recent terraform vid)

shane
2017-11-21 21:03
yep

wdennis
2017-11-21 21:04
maybe it isn't integrated into the UX yet?

shane
2017-11-21 21:05
there's the `task` named `ipmi-configure` which implements the actual configuration which would happen as a stage - there are templates which are used for the configure

shane
2017-11-21 21:05
but looking at pure power management - I think the only 3 params that matter are: `ipmi/address` `ipmi/username` `ipmi/password`

shane
2017-11-21 21:06
you need these applied to a Machine - which then should enable the IPMI power actions in the UX

shane
2017-11-21 21:06
username and password can be a profile - that you apply (even in `global` if you wanted to apply to everything)

wdennis
2017-11-21 21:06
Cool, I have all three set on one of my machines - however, no IPMI buttons showing up...

shane
2017-11-21 21:06
address would have to be a per-Machine param

wdennis
2017-11-21 21:06
yes

vlowther
2017-11-21 21:07
It looks like setting ipmy/enabled, ipmi/username, ipmi/password, and ipmi/address to the proper values on each node (via profile for all but address, which is definitly node-specific) is what is required

vlowther
2017-11-21 21:07
the ipmi-configure task will do that

vlowther
2017-11-21 21:07
if you have existing settings, it looks like ipmi/configure/network=false and ipmi/configure/user=false params will keep the ipmi-configure task from mucking about with your settings.

wdennis
2017-11-21 21:07
Ah, `ipmi/enabled`...

wdennis
2017-11-21 21:07
Let me try setting that

shane
2017-11-21 21:08
make sure you're profile applies the `ipmi/configure/...=false` settings as @vlowther mentioned for safety reasons, too

wdennis
2017-11-21 21:10
OK, now without doing anything... The IPMI buttons are showing up on the machine

vlowther
2017-11-21 21:10
As expected.

wdennis
2017-11-21 21:10
(Perhaps I moved off the machine screen and then back on?)

wdennis
2017-11-21 21:11
When I clicked "Reboot" I got a popup saying "Action Error"

greg
2017-11-21 21:11
The ipmi plugin requires no parameters. The configure parameters would need to be on a machine.

wdennis
2017-11-21 21:12
Just these three? ``` "ipmi/address": "testnode01-ipmi", "ipmi/password": "xxxxxxx", "ipmi/username": "root" ```

shane
2017-11-21 21:13
might be dns problem w/ ipmi address resolution ?

shane
2017-11-21 21:13
from command line on DRP Endpoint - does that short name resolve correctly ?

wdennis
2017-11-21 21:13
yes

wdennis
2017-11-21 21:14
```[dradmin@dr-admin ~]$ dig testnode01-ipmi ; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7 <<>> testnode01-ipmi ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31366 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;testnode01-ipmi. IN A ;; ANSWER SECTION: testnode01-ipmi. 1 IN A 192.168.1.161 ;; Query time: 0 msec ;; SERVER: 192.168.1.254#53(192.168.1.254) ;; WHEN: Tue Nov 21 11:57:16 EST 2017 ;; MSG SIZE rcvd: 60 ```

shane
2017-11-21 21:14
can you confirm that `ipmitool -U root -P xxxxx -H testnode01-ipmi chassis power status` runs successfully **from** the DRP Endpoint ?

wdennis
2017-11-21 21:18
Interesting... I usually don't use the `-P` param (if you omit it, it asks for the password interactively)... When I passed it, here's what I see: ```[dradmin@dr-admin ~]$ ipmitool -I lan -H testnode01-ipmi -U root -P xxxxxxx -a chassis power status Password:```

wdennis
2017-11-21 21:19
If I just hit <CR> at the password prompt, I get a fail: ```[dradmin@dr-admin ~]$ ipmitool -I lan -H testnode01-ipmi -U root -P xxxxxx -a chassis power status Password: Activate Session command failed Error: Unable to establish LAN session Error: Unable to establish IPMI v1.5 / RMCP session```

vlowther
2017-11-21 21:19
-I lanplus

vlowther
2017-11-21 21:20
and -a?

wdennis
2017-11-21 21:20
same diff...

wdennis
2017-11-21 21:20
Let me add `-a`

shane
2017-11-21 21:20
can you `ping testnode01-ipmi` ?

wdennis
2017-11-21 21:21
Nope, still asking for password

wdennis
2017-11-21 21:21
@shane - I can, it resolves the IP, but the IPMI interface doesn't allow pings

wdennis
2017-11-21 21:22
```[dradmin@dr-admin ~]$ ping testnode01-ipmi PING testnode01-ipmi.necla.lab (192.168.1.161) 56(84) bytes of data. ^C --- testnode01-ipmi.necla.lab ping statistics --- 9 packets transmitted, 0 received, 100% packet loss, time 7999ms```

wdennis
2017-11-21 21:22
But, I know IPMI works when I use `ipmitool` with user & password

greg
2017-11-21 21:22
remove the `-a`

wdennis
2017-11-21 21:23
duh

wdennis
2017-11-21 21:23
```[dradmin@dr-admin ~]$ ipmitool -I lanplus -H testnode01-ipmi -U root -P xxxxxx chassis power status Error: Unable to establish IPMI v2 / RMCP+ session```

wdennis
2017-11-21 21:24
This *is* a fairly old box; a PowerEdge 860 circa 2009...

wdennis
2017-11-21 21:25
Needs to use IPMIv2, huh?

vlowther
2017-11-21 21:25
So far it just looks like the dr-admin box cannot talk to that IPMI controller.

vlowther
2017-11-21 21:26
unless I missed somethbing in the backscroll.

wdennis
2017-11-21 21:26
No, I can talk to it via IPMIv1.5...

vlowther
2017-11-21 21:26
Well then.

vlowther
2017-11-21 21:27
We may need to make the IPMI protocol configurable. :slightly_smiling_face:

wdennis
2017-11-21 21:27
```[dradmin@dr-admin ~]$ ipmitool -I lan -H testnode01-ipmi -U root -a chassis power status Password: Chassis Power is on```

vlowther
2017-11-21 21:27
IIRC, we have lanplus hardcoded right now

wdennis
2017-11-21 21:27
ah

greg
2017-11-21 21:27
yes

vlowther
2017-11-21 21:28
due to lan being old, trivially crackable, and the 9 and 10gen box we started dev on way back in the day supporting lanplus. :)\

wdennis
2017-11-21 21:28
OK, then maybe only supporting `lanplus` is reasonable then :slightly_smiling_face:

wdennis
2017-11-21 21:29
notes to self must get more recent test platforms...

wdennis
2017-11-21 21:30
I do have a PE R320 with standard iDRAC on my testbed, lemme try that...

vlowther
2017-11-21 21:35
note that lanplus is also crackable, but it is at least possible to makei it harder to do so

vlowther
2017-11-21 21:41
http://fish2.com/ipmi/cipherzero.html <-- for your viewing pleasure

wdennis
2017-11-21 21:55
@vlowther thx

wdennis
2017-11-21 21:55
shudders

wdennis
2017-11-21 21:56
Yay, working platform!

shane
2017-11-21 21:56
cool - we'll add an option to configure the IPMI Interface type to use

wdennis
2017-11-21 21:57
note to find more recent Dell platforms for testbed

2017-11-21 21:57
Time to feed the :bear:!

wdennis
2017-11-21 21:59
@vlowther @greg What does "Disk" impi plugin button do?

wdennis
2017-11-21 22:00
(and why is "PXE' represented as a paperclip??)

greg
2017-11-21 22:00
Set next boot pxe to disk

wdennis
2017-11-21 22:00
ah ok

shane
2017-11-21 22:02
@wdennis what would you represent "PXE" as in icon form ?

wdennis
2017-11-21 22:23
Tinkerbell?s shoe (the one with the little Pom-Pom on the tip) :stuck_out_tongue_winking_eye:

wdennis
2017-11-21 22:24
?Pixie?, get it? Huh huh huh

shane
2017-11-21 22:37
Yep

ctrees
2017-11-22 16:09
so @wdennis @greg from what I gather, sledgehammer or ?? CoreOS Ignition ?? could preform a / the clonezilla compatible process as a 'task'... I'm assuming the goal could be both to 'create' and / or 'restore' a backup image....

ctrees
2017-11-22 16:18
seems if a sledgehammer task was to 'restore' a clonezilla created image to local disk, that then fits into @wdennis current infra with no change ? ( just coming up to speed on what you both are thinking )

vlowther
2017-11-22 16:24
@wdennis also, the PE860 is 2006, not 2009. :slightly_smiling_face:

greg
2017-11-22 16:44
Yes- I wasn?t looking at clonezilla as the starting point, though that might work. It has some custom tools.

greg
2017-11-22 16:44
I was looking more at Hashicorp?s Packer tools.

wdennis
2017-11-22 16:48
@vlowther how time flies...

wdennis
2017-11-22 16:49
@greg what format is a Packer image?

greg
2017-11-22 16:49
Packer can generate rootfs tgzs, raw disks, amis,

greg
2017-11-22 16:49
from the same config

wdennis
2017-11-22 16:50
Hmmmm... nice

greg
2017-11-22 16:50
So, the imaging tasks I?m looking potentially use ignition and an config file generated by template.

wdennis
2017-11-22 16:50
And the image format?

greg
2017-11-22 16:51
The template defines disks, partitions, sw raid, images, filesystems, and possible files.

greg
2017-11-22 16:51
With resize operations.

greg
2017-11-22 16:51
It is a bit of a side project, but kinda works.

greg
2017-11-22 16:51
The main issue I?m working on right now is get the systems to boot using syslinux bootloaders. May need to switch.

greg
2017-11-22 16:51
The goal would be to allow for windows images as well.

greg
2017-11-22 16:52
Since packer generates both raw disk, raw partition, and filesystem tarballs you can do both.

wdennis
2017-11-22 16:52
I think it?s great that you are working on this...

greg
2017-11-22 16:53
The idea is that you could build an ignition+ template that defines the partition layer, the filesystem type, and then the image tarball for each partition.

greg
2017-11-22 16:53
That would be one level.

greg
2017-11-22 16:53
You could also get a raw disk image with a resize to fill disk option.

greg
2017-11-22 16:53
or a partition option as well.

wdennis
2017-11-22 16:53
FS tarballs could be restored without mucking with partition resizing

greg
2017-11-22 16:54
yes

wdennis
2017-11-22 16:54
Love it

greg
2017-11-22 16:54
Still aways off. I have most of the imaging pieces. I?m still fighting with getting the systems to boot.

greg
2017-11-22 16:54
May have to switch from syslinux based bootign to grub2 based booting.

greg
2017-11-22 16:55
And windows sucks.

wdennis
2017-11-22 16:55
On that we can all agree :)

wdennis
2017-11-22 17:03
@greg looking at Packer docs; what Builder type do you use to create the FS tarballs?

shane
2017-11-22 17:07
packer is a pretty nice tool - it's possible to make images that can be used for VMs and baremetal from the same base code - so you can have consistency across your infrastructure ...

greg
2017-11-22 17:09
Not sure - I know it can be done though.

wdennis
2017-11-22 17:11
Hmmm... wonder how this can be done?

shane
2017-11-22 17:13
the Hashi docs are always awful - they miss about 50 to 75% of functionality - and they don't provide any actually useful examples

shane
2017-11-22 17:13
you have to resort to googley-oogly searches to find real info on most of their products

shane
2017-11-22 17:13
packer is the same - absolutely bare minimal documentation

shane
2017-11-22 17:14
(yeah, I know - glass houses, stones, and all of that ... )

wdennis
2017-11-22 17:18
Power of.... _Google!_

shane
2017-11-22 17:19
Tarball is a post processor, it's not done as a builder

greg
2017-11-22 17:20
yeah - it is complex but pretty powerful.

shane
2017-11-22 17:23
ultimately - packer basically "spins up" ... something ... that's a "builder" ... then you do "stuff" via their DSL (or a "provisioner"). Then you do something with the build - that's the "post processor" part. you can do some pretty cool full CI/CD workflow with it - define, create, configure, deploy, test, create artifacts (eg tarball, raw, etc), then tear down the builder

shane
2017-11-22 17:27
it'll also do staging of your artifacts to "places" - registries, repos, etc.

shane
2017-11-22 17:28
there are a number of non-Hashi maintained plugs too - so there's a lot more out there that you can do - than what you find on Hashi website

wdennis
2017-11-22 17:29
Yup, got that - have played with Packer a bit to build images for AWS and DO

wdennis
2017-11-22 17:32
Someone asked on the Packer Google Group about a builder for bare metal targets (Sep 5 2017 post), someone else answered that Packer only targets cloud providers & hypervisors that support snapshot images

shane
2017-11-22 17:33
building an image is a different thing than creating artifacts to deploy bare metal from ...

shane
2017-11-22 17:33
but most of the Hashi tools are "cloud centric" ...

wdennis
2017-11-22 17:34
Doesn?t sound like quite the right tool for the job...

wdennis
2017-11-22 17:34
I thought from what @greg said that you guys had it working tho

shane
2017-11-22 17:35
we have indeed used it in the past - I'm not sure in what capacity - since I haven't used it w/in RackN - I used it at my previous 2 gigs

greg
2017-11-22 17:43
I said I can deploy tgz fs images, raw disk images, or partition images. Where those come from is up to you. I?ve worked with people who have used packer to generate those.

wdennis
2017-11-22 17:43
Well, I?ll wait to see what you guys come up with...

wdennis
2017-11-22 17:43
@greg oh, misunderstood then...

greg
2017-11-22 17:44
What I can?t do is currently boot the system consistently afterwards. That is why this isn?t finished.

greg
2017-11-22 17:44
We worked with one customer to image-based installs. Generated a very custom system for them. I?ve been looking at generalizing that for wider consumption. I?m not done and may not be done for a while.

wdennis
2017-11-22 17:45
Got it

2017-11-22 17:46
so curious whats used to tie the rebar backend to deploy systems, to say the customer facing side... from a hosting/server reseller perspective.....

2017-11-22 17:47
we want say customer A to be able to login / order/ pay and spin up both physical hardware and XenServer VMs

shane
2017-11-22 17:47
@wdennis - here's an example of how someone is using packer/docker/and tooling to extract rootfs for http://packet.net images: https://github.com/packethost/packet-images

wdennis
2017-11-22 17:50
Very interesting

wdennis
2017-11-22 17:51
Those Packet folks got it going on...

greg
2017-11-22 17:54
@outbackdingo - DRv2 used to have a multi-tenant system to handle some of this. We found it complex and people weren?t using it. So, in DRP, we don?t have that for now. We?ll see if it needs to grow one. DRP does have a object-level restriction system, but using it for multi-tenancy could be a stretch. Doable, but a stretch.

2017-11-22 17:57
@greg so when looking at say hetzner OVH, and every other server/hosting provider out there, what in heck are they using to deploy for clients in the backend... ive looked high and low

greg
2017-11-22 18:01
@outbackdingo - well - not sure. I can guess. I suspect most roll their own in some way shape or form.

greg
2017-11-22 18:01
If I were doing it, I would actually use DRP as part of the solution.

2017-11-22 18:02
@greg yes DRP i plan to try to deploy tonight... wheres the guide again

2017-11-22 18:03
ive contemplated rolling something from ansible / terraform

2017-11-22 18:03
but id figure someones already selling something like this for hosting providers

greg
2017-11-22 18:03
For an SP or HP, you would need a billing system, user/identity/control system, and a provisioning system. I would start them as three separate services than the user/identity control system drive the billing and the provisioning systems. But that is high-level fluff version.

greg
2017-11-22 18:03
:slightly_smiling_face:

2017-11-22 18:04
yeah from the old web hosting only days... WMHCS... blah......


2017-11-22 18:12
@greg Debian 9 host of to deploy? or CentOS 7 better?

2017-11-22 18:13
i prefer FreeBSD.... but.......

2017-11-22 18:13
its all in docker containers still right ?

shane
2017-11-22 18:13
No docker

greg
2017-11-22 18:13
no - single go binary

2017-11-22 18:13
ok... nothing here says what server os should be

shane
2017-11-22 18:14
(unless you want to stick the golang binary in one for fun)

shane
2017-11-22 18:15
Any Linux distro that is running on 64 bit hardware will work.

shane
2017-11-22 18:15
We do recommend centos or Ubuntu as we test on that

2017-11-22 18:15
ok Debian VM it is

shane
2017-11-22 18:17
Debian should work fine, the installer verifies dependencies and takes care of them, I think Debian is working in installer

2017-11-22 18:37
ok how to i stop these damn console messages scrolling by

2017-11-22 18:37
jeeez

2017-11-22 18:39
and why is this giving an error ./drpcli machines bootenv 59bcca1e-7cfb-4ab4-ae2c-7e5475205b36 centos-7-install Error: ValidationError: machines/59bcca1e-7cfb-4ab4-ae2c-7e5475205b36: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install

greg
2017-11-22 18:43
The first is because you ran in isolated mode and it logs those messages to stdout/stderr. Under production those go to systemctl logging.

greg
2017-11-22 18:43
Second, you need to change the machines stage instead of bootenv. Setting the stage will set the bootenv.

greg
2017-11-22 18:43
Same command change ?bootenv? to ?stage?

2017-11-22 18:46
ok...install doing a centos automated install on an XenServer template

2017-11-22 18:47
interesting....

2017-11-22 18:47
lets see what happens when its done

2017-11-22 18:49
Nice UI by the way... very nice

greg
2017-11-22 18:49
@zehicle will like to hear that. :slightly_smiling_face:

greg
2017-11-22 18:50
Did you do workflows?

2017-11-22 18:52
@greg uhoohhh ? workflows ?

2017-11-22 18:52
i followed the guide

greg
2017-11-22 18:52
well - it may cycle.

greg
2017-11-22 18:52
because you didn?t tell it what to done when done installing.

greg
2017-11-22 18:52
I have a fix for that coming, but it isn?t ready yet. maybe next week.

greg
2017-11-22 18:52
Anyway,

greg
2017-11-22 18:53
In workflows, you want to add centos-7-install -> complete-nowait Success

2017-11-22 18:53
hah... docs stated Reboot your Machine - it should now kick off a BootEnv install as you specified above. watch the console, and you should see the appropriate installer running the machine should reboot in to the Operating System you specified

greg
2017-11-22 18:53
That last bit may not work. :neutral_face:

greg
2017-11-22 18:54
it should, but may not.

2017-11-22 18:54
and it seems sitting there at running post-installation scripts

2017-11-22 18:54
so where do i update this workflow

greg
2017-11-22 18:54
In the UX, under workflows

greg
2017-11-22 18:55
next to the add step button, `From Stage` should be `centos-7-install`. `To Stage` should be `complete-nowait` and leave success.

greg
2017-11-22 18:56
Then click the add step button.

2017-11-22 19:05
shouldnt success be reboot ?

2017-11-22 19:21
still seems stuck sitting there at running post-installation scripts

2017-11-22 19:28
@greg seems no joy

2017-11-22 19:29
darn and i was hping to deploy 3 kubernetes nodes tonight....

2017-11-22 19:55
ok seems even the ubuntu install is broken... at least here for some reason... it states no root filesystem is defined

greg
2017-11-22 19:57
No on reboot. We want the install stage to finish its install instead of us rebooting the system.

greg
2017-11-22 19:58
You can check jobs tab to see if anything failed. It might give hints.

2017-11-22 19:59
centos-7-install Start complete-nowait Success (remove step)

greg
2017-11-22 19:59
The no root FS message indicates an LVM install that isn?t cleared. We have a stage for that now.

greg
2017-11-22 20:00
It is a more complex workflow.

2017-11-22 20:02
im shpwing notthing in failed jobs and no successfulinstalls

2017-11-22 20:05
this cant be that broken, iv e got to be doign something wrong here

2017-11-22 20:27
tried this 8 times now... same result.... nothing installs completely

2017-11-22 20:46
guess its time to nuke the install and try not using TIP

2017-11-22 21:14
grrrrr.... same result....

2017-11-22 21:20
@greg ok maybe im beingg dumb but following the quck start results in 0 vms getting installed properly so where is my screw up :)

shane
2017-11-22 21:37
@outbackdingo - I can give you a hand, but it'll be 2 hrs before I can - traveling right now

greg
2017-11-22 21:48
@outbackdingo - to start, lets recap. You have tip installed. You have VMs discovered through sledgehammer. You then change stage to centos-7-install. That appears to hang in post-install process.

greg
2017-11-22 21:49
Two things - I suspect that the stages and their subtly are getting you. Also, I suspect that you may need more stages.

greg
2017-11-22 21:52
the last one is to address the ubuntu issue.

greg
2017-11-22 21:55
We should set up some things.

greg
2017-11-22 21:55
1. ssh keys for access before and after install.

greg
2017-11-22 21:56
from the cli do this: ```drpcli profiles set global param access-keys to '{ "galthaus": "ssh-rsa bigkeyhere galthaus@Gregs-MacBook-Pro.local" }'```

greg
2017-11-22 21:56
`galthaus` is just a string to identify the key. The second part is the part that will go in the `authorized_keys` file.

greg
2017-11-22 21:56
Next, we need to make sure the workflows are good to go.

greg
2017-11-22 22:00
This is the workflow I use for some of this stuff. 1 Go to UX workflow screen. 2. remove all steps. 3. add these. a. discover -> sledgehammer-wait success b. prep-install -> centos-7-install reboot c. centos-7-install -> complete-nowait success

greg
2017-11-22 22:01
4. Remove the machines from DRP (destroy cli or UX delete) 5. Boot a VM. See that it goes through discover and sits in sledgehammer-wait

greg
2017-11-22 22:02
Once the machine gets to `sledgehammer-wait` stage

greg
2017-11-22 22:03
issue this command `drpcli machines stage <uuid> prep-install` or use the ux to change the stage to `prep-install`

greg
2017-11-22 22:03
The advantage of this workflow set up is that it will do two things.

greg
2017-11-22 22:03
One wipe the disks of data.

greg
2017-11-22 22:03
and two it will reboot the machine automatically for you

greg
2017-11-22 22:04
While the machine is in sledgehammer wait, you should be able to ssh into the box and look around.

greg
2017-11-22 22:05
The install should complete by setting the stage to `complete-nowait` and bootenv to `local`.

greg
2017-11-22 22:06
Once booted to the new OS, you should be able to get in by ssh.

2017-11-22 22:35
ok ill work through this

2017-11-23 07:26
ok... so it seems i can now successfully build an image it boots i cant however ssh into it... is there a default username/password for them ?

2017-11-23 07:26
so making progress

2017-11-23 08:13
im beginning to wonder just how much work is ahead of me... wanting to A: spin up an XenServer VM ..... instal lan OS on it... Then run an ansible playbook against it to configure it all...

2017-11-23 08:15
i see no "XenServer" provider even

shane
2017-11-23 19:31
@outbackdingo - please see @greg's comment on how to inject an SSH key to the server config:

2017-11-23 19:34
@shane funny thing is i did that

shane
2017-11-23 19:36
can you please provide the output of (before pasting it - please obscure your SSH public key piece): `drpcli profiles show global`

shane
2017-11-23 19:36
also - the SSH public key will only be injected on provisioning - AFTER you install an OS (bootenv) - just adding the ssh key to the global profile will not effect/change any existing already provisioned machines

2017-11-23 19:41
@shane https://pastebin.com/6EjmjvuS

2017-11-23 19:50
@shane hrmmm i think i see the issue

2017-11-23 19:50
"access-keys": { "dingo": "ssh-rsa ssh-rsa xxxxxxxxxxxxxxxxxxxxxxxxxxxxxzsx

2017-11-23 19:51
ssh-rsa X 2

shane
2017-11-23 19:56
yep - that'd def. cause an issue ... :slightly_smiling_face:

2017-11-23 19:57
@shane ok trying again... question though why do i have to change something..... is there no way to say boot VM -> discover -> install

2017-11-23 19:58
discover Start sledgehammer-wait Reboot (remove step) centos-7-install Reboot (remove step) complete-nowait Success (remove step)

2017-11-23 19:58
where i have to run ./drpcli machines stage 1fb9e96b-2627-4f6b-a684-0eeac1657217 centos-7-install and reboot the vm

2017-11-23 20:01
ok that VM successfully installed and i can ssh into it

2017-11-23 20:02
now to deal with this Workflow

2017-11-23 20:02
suggestions wlecome

greg
2017-11-23 20:30
Yeah. So what do you want? I suggest to pause tonkeep from wiping out systems. But not you flow. All good.

greg
2017-11-23 20:31
Okay - so not that I?m at a computer and not fighting the phone.

2017-11-23 20:31
@greg id prefer to be able to just go through booting the VM to install to complete-nowait without having do anything

2017-11-23 20:31
no rush

greg
2017-11-23 20:32
@outbackdingo - if you want to just have machines go straight to install, you will want to do something like this:

greg
2017-11-23 20:32
Actually, a couple of questions? Do you wish to inventory the system?

greg
2017-11-23 20:33
and do you want to choose the OS?

2017-11-23 20:34
well initially the game plan is to get to wanting to A: spin up an XenServer VM ..... B: install an OS on that vm... C: Then run an ansible playbook against it to configure it all

2017-11-23 20:34
right now id be happy to be able to choose the os and do the install

2017-11-23 20:34
withoutt intervention

greg
2017-11-23 20:34
Are the disks ?clean? on the VM spin up?

2017-11-23 20:34
always

greg
2017-11-23 20:34
okay - then ? you can do this.

greg
2017-11-23 20:35
In the UI, set the default stage to the OS install you want to install at the moment.

greg
2017-11-23 20:36
In the workflow, make sure you have whatever stage you selected in default stage step that goes from it to complete-nowait.

greg
2017-11-23 20:36
with success.

greg
2017-11-23 20:36
Done

greg
2017-11-23 20:37
Ugh - I really want that to work, but it won?t.

greg
2017-11-23 20:37
Sorry, still have to create a machine.

greg
2017-11-23 20:37
Turkey on the brain - just a second.

2017-11-23 20:37
so just have a workflow like centos-7-install Start complete-nowait Success (remove step)

greg
2017-11-23 20:38
You will need that for an OS you want to install.

greg
2017-11-23 20:38
To auto install, you will want the following workflow steps:

greg
2017-11-23 20:38
discover -> centos-7-install : Reboot

greg
2017-11-23 20:38
centos-7-install -> complete-nowait : Success

greg
2017-11-23 20:39
Keep adding OSes by repeating the last one.

greg
2017-11-23 20:39
When you want to change the OS globally, remove the discover step and change it to discover -> OS of choice : Reboot

2017-11-23 20:40
ok, but i can create multiple workflows correct ?

greg
2017-11-23 20:40
This should have the machines start, boot into sledgehammer, create a machine entry, and then reboot to install the os, then reboot into the final OS with no changes.

2017-11-23 20:40
ok

greg
2017-11-23 20:40
The problem is how to choose the OS you want when you want.

2017-11-23 20:40
right

greg
2017-11-23 20:40
usually, that requires a step to either change the stage or add a parameter/change stage.

greg
2017-11-23 20:41
Do you use terraform?

2017-11-23 20:42
@greg i think its going to be needed if i plan to create xenserver VMs -> then boot them and install an OS

2017-11-23 20:42
i dont see any providers in rebar for xenserver

greg
2017-11-23 20:42
Can you describe your XenServer environment? Single node or multi-node XenServer?

2017-11-23 20:42
lab is single node... deployed is multinode

greg
2017-11-23 20:42
We don?t have providers anymore for anything that creates the machine. We could but don?t. Wasn?t our direct business.

greg
2017-11-23 20:43
Okay - do you have an API that can create and wait for servers?

2017-11-23 20:43
yes

greg
2017-11-23 20:43
hmm - okay - thinking about this.

2017-11-23 20:45
does terraform work with rebar ?

greg
2017-11-23 20:46
We have a provider that plugs into terraform that can drive DRP and choose stage.

greg
2017-11-23 20:46
@shane has the start of mixing packet and DRP to do what you are trying to do. I think he is close.

2017-11-23 20:47
essentially... i have two targets the create boot xenserver vm, install os, run ansible against vm.... second create a group of xenserver vms, install kubernetes cluster

greg
2017-11-23 20:47
We have some cases for that. Or close on most.

2017-11-23 20:47
i did also see the kubernetes stuff... but i am thinking it requires an installed OS in the VM first

greg
2017-11-23 20:48
well - it does today. I?m trying a Live OS k8s cluster, but that is for another day.

2017-11-23 20:49
so if i have say 3 vms installed with centos ? i can deploy kubernetes onto it now ?

greg
2017-11-23 20:49
Well , you can use kubespray ansible playbook and DRP ansible inventory generator to do it. I think @zehicle posted a video with this.

greg
2017-11-23 20:50
You get three machines OS installed through the workflow, then add then into a profile and that can then be used with an ansible inventory generator to build those machines.

greg
2017-11-23 20:51
Checking our youtube channel.




2017-11-23 21:00
cool... ill watch this tonight and again tomorrow morning and see where i get to with it

zehicle
2017-11-23 21:01
Skedgehammer works for testing kubespray too


greg
2017-11-23 21:16
okay - I?m back to thanksgiving partying.

2017-11-23 21:17
@greg enjoy it.... im heading to sleep 10PM where i am in Italy

2017-11-23 21:17
@zehicle ill look at it also

zehicle
2017-11-23 21:34
It's the written version of the video

zehicle
2017-11-23 21:34
We can also get you a slack account

zehicle
2017-11-23 21:35
There is a form for it on RackN. com

zehicle
2017-11-23 21:35
From the ux

2017-11-24 07:21
@zehicle ok slack accoun requested, working o roll this up today and get it done

i.grischott
2017-11-24 12:38
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F86047EUW/grafik.png and commented: Hi it's my first time here. i'm IT technician and i'm looking for good solution to manage bare metal servers, kubernetes, kvm .. first time il'checked solutions from CoreOS with Tectonic, the Foreman., Promox. now i'm here :slightly_smiling_face: I already have the dr-provision installed and it seems to be running ..

i.grischott
2017-11-24 12:41
i wan't to add some machines .. but i don't know how..

i.grischott
2017-11-24 12:42

i.grischott
2017-11-24 12:42
PXE Boot won't work.

i.grischott
2017-11-24 12:43
In the documentation i didn't find how can i initialize the bare metal machines with PXE Boot...

i.grischott
2017-11-24 13:22
Sorry forget my post I made a reasoning mistake .. I have the dr-provisioner running in a docker container on CoreOS, that can not work ..

2017-11-24 13:30
@i.grischott it can if the ports are exposed

zehicle
2017-11-24 14:25
@i.grischott you need to make sure you set the config preferences to provide the discovery image - the defaults ignore requests. Which also means that you have to upload the sledgehammer image. The discovery image will auto register the machines when it boots.

zehicle
2017-11-24 14:26
Also, you may need to set the --static-ip address for the interface you are listening on depending on the o/s you are installed on.

kamp.scott
2017-11-24 15:03
has joined #json

shane
2017-11-24 16:28
@i.grischott - can you please paste the output of the following command on from your DRP Endpoint: `drpcli subnets show`

i.grischott
2017-11-24 16:29
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F85AW211B/tftpp_read_timeout.jpg and commented: one step further..

shane
2017-11-24 16:29
also - it's important to verify that you map/allow the following ports in to the DRP Endpoint: 67 for DHCP 69 for TFTP 8091 for HTTP 8092 for API access

shane
2017-11-24 16:35
from an external node to the DRP Endpoint, you can verify most of these connections by the following tests: for TFTP test: ```tftp 172.17.0.2 get ipxe.pxe``` for HTTP test: ```curl -s -o /tmp/ipxe.pxe http://172.17.0.1:8091/ipxe.pxe``` for API test - install `drpcli` (or copy the appropriate architecture binary to your remote machine), and run: ```drpcli -E https://172.17.0.1:8092/ info get```

shane
2017-11-24 16:37
you also need to start the `dr-provision` environment with the `--static-ip` set to the external NAT address that maps to the container - otherwise, the DRP Endpoint won't respond correctly

shane
2017-11-24 16:38
so - on starting the DRP endpoint, do: `dr-provision --static-ip=172.17.0.1 ... other args ... `

i.grischott
2017-11-24 16:45
for testing i start the container with --net=host..

i.grischott
2017-11-24 16:48
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F84M3EHJM/tftp_get_works_locally.jpg and commented: think it's a port mapping problem.. can't tftp get from other machine..

greg
2017-11-24 19:50
Make sure you use the IP that can access the tftp directory as the `--static-ip`

greg
2017-11-24 19:51
`--static-ip=192.168.0.1`

kamp.scott
2017-11-25 08:36
is here a way to provision CentOS 6 machine? or do i have to create tasks for all of it

kamp.scott
2017-11-25 08:53
Missing ISO: Please Upload Explode ISO: iso does not exist: /home/dingo/drp-data/tftpboot/isos/CentOS-6.9-x86_64-bin-DVD1.iso Error You can download the required ISO from http://mirrors.kernel.org/centos/6.9/isos/x86_64/CentOS-6.9-x86_64-bin-DVD1.iso Error bootenv: centos-6.9-install: missing kernel images/pxeboot/vmlinuz (/home/dingo/drp-data/tftpboot/centos-6.9/install/images/pxeboot/vmlinuz) Error bootenv: centos-6.9-install: missing initrd images/pxeboot/initrd.img (/home/dingo/drp-data/tftpboot/centos-6.9/install/images/pxeboot/initrd.img)

kamp.scott
2017-11-25 08:54
but i did upload the iso and it does show under ISOS trying to enable Centos 6 environments

kamp.scott
2017-11-25 08:54
CentOS-6.9-x86_64-bin-DVD1.iso 100%[===================================================================>] 3.70G 20.1MB/s in 3m 13s 2017-11-25 03:49:14 (19.6 MB/s) - ?CentOS-6.9-x86_64-bin-DVD1.iso? saved [3972005888/3972005888] root@streisand:/home/dingo# mv CentOS-6.9-x86_64-bin-DVD1.iso /home/dingo/drp-data/tftpboot/isos/

kamp.scott
2017-11-25 09:10
./drpcli bootenvs uploadiso centos-6-install seems to have worked

i.grischott
2017-11-25 22:11
it works now.. there wasn't set the default gateway on the docker host :disappointed:


i.grischott
2017-11-25 22:14
it's possible to add other OS'ses for deploying ? like Container Linux (aka CoreOS).. i like the update functionality and security of this OS..

i.grischott
2017-11-25 22:47

i.grischott
2017-11-25 22:52
i want to setup openstack on kubernetes.. i try to adopt this video (https://www.youtube.com/watch?v=6xuVm9PJ2ck) to the new UI .. puh.. a lot of new features .. is there a easier guide to initialize openstack on kubernetes?

shane
2017-11-25 23:51
@i.grischott your best bet is to use an existing a ansible playbook, with our a ansible content pack

shane
2017-11-26 01:11
that video is from Digital Rebar ver 2 - the current version (Digital Rebar Provision ver 3) does not support cross-node orchestration - however, we support integrations with third party tools (like Ansible) which allow you to do complex application installations through that tooling

kamp.scott
2017-11-26 09:39
wait ... what? openstack on kubernetes? isnt that backwards? should be kubernetes on opensttack ?

kamp.scott
2017-11-26 09:49
sounds almost like Joyents Triton.....

shane
2017-11-26 15:25
@kamp.scott - yes, there is a big shift in OpenStack to use Kubernetes as the orchestration piece to manage the OpenStack services - there are lots of tooling popping up that containerizes each component of the OpenStack puzzle. This in theory provides a "self-healing" control plane, and also (in theory) minimizes the OpenStack service management overhead ...

kamp.scott
2017-11-26 15:52
id love to see that deployed

kamp.scott
2017-11-26 15:52
though curious how you sin up froms from kubernetes probably ony kvm supported

zehicle
2017-11-26 19:15
@i.grischott those v2 demos were before AT&T moved the project into OpenStack governance and repackaged it a set of stages. We're working on k8s metal install and watching the OpenStack Helm community to see when they get something generic. There are some other OpenStack on K8s efforts (Kolla) that show promise.

zehicle
2017-11-26 19:15
It seems like our Ansible integration is a key for those efforts and servers a more general purpose anyway.

i.grischott
2017-11-27 07:28
you think a good approach is setup k8s with your plugin kubespray then deploy openstack with helm ?

wdennis
2017-11-27 20:10
protip: Filter out voluminous gohai data in machine records via: `drpcli machines show <uuid> | jq 'del(.Profile)'`

wdennis
2017-11-27 20:13
Hi all! back into the fray

shane
2017-11-27 20:14
...and similarly if you want to list all Machines if you don't know the UUID, and filter out gohai: `drpcli machines list | jq 'del(.[].Profile)'`

wdennis
2017-11-27 20:14
Trying to understand where my machine install is at... I see this output on a machine undergoing an install: ```[dradmin@dr-admin drp]$ drpcli machines show 4f316320-fb0c-46f2-8578-f0d8f13177e1 | jq 'del(.Profile)' { "Address": "192.168.1.112", "Available": true, "BootEnv": "ubuntu-16.04-install", "CurrentJob": "a4e0839e-31e0-4a87-af9c-9d07f2e3b158", "CurrentTask": -1, "Errors": [], "Name": "ml47", "OS": "ubuntu-16.04", "Profiles": [ "necla-ubuntu-default" ], "ReadOnly": false, "Runnable": false, "Secret": "xxxxxxxxxxx", "Stage": "ubuntu-16.04-install", "Tasks": [ "ubuntu-drp-only-repos", "ssh-access", "change-stage" ], "Uuid": "4f316320-fb0c-46f2-8578-f0d8f13177e1", "Validated": true }```

wdennis
2017-11-27 20:15
The `CurrentJob` attribute is the last job that ran (I see this in the UX in "Jobs")

wdennis
2017-11-27 20:15
Why is `CurrentTask` value a `-1`?

wdennis
2017-11-27 20:18
What I'm trying to do is see that the machine is actually installing the OS (Ubuntu 16.04 in this case)

wdennis
2017-11-27 20:20
Any way to know that from the data returned from `machines show`?

shane
2017-11-27 20:40
@wdennis I'm not sure off hand what `-1` means - but my (infantile) reading of the `./backend/machines.go` seems to indicate (rather counter-intuitively) that it means we have Tasks to run): ```if n.Tasks != nil && len(n.Tasks) > 0 { n.CurrentTask = -1 }```

shane
2017-11-27 20:40
I believe final interpretation is likely going to be needed by @greg or @vlowther

greg
2017-11-27 20:40
yeah - just a minute. Sorry.

greg
2017-11-27 20:41
-1 means start of list.

greg
2017-11-27 20:41
It hasn?t tried to run anything yet.

greg
2017-11-27 20:41
This is the one spot we don?t have a good view in what is going on.

greg
2017-11-27 20:42
This tells me that the machine is somewhere inbetween the boot into install and drpcli getting control in the post install script of the preseed file.

shane
2017-11-27 20:42
also - you can filter the `gohai-inventory` w/ `jq` - but because `gohai-inventory` contains a dash - it is special to JSON, so you have to "escape" it, as follows: `drpcli machines show <UUID> | jq 'del(.Profile.Params."gohai-inventory")'`

shane
2017-11-27 20:43
your method will also filter out (potentially) other useful `Params` from the JSON output

zehicle
2017-11-27 21:04
@i.grischott I think Helm will _eventually_ be the way to deploy OpenStack. there are some technical items to resolve first.

zehicle
2017-11-27 21:04
we're focused on making k8s installs to metal better b/c that

zehicle
2017-11-27 21:04
is a prereq

ctrees
2017-11-28 14:26
anyone have a good 'ssh key rotation strategy for ops' reference or article ?

ctrees
2017-11-28 14:38
So far... This is what I've got:


vlowther
2017-11-28 14:55
hm, that seems more geared to auditing purposes than anything else.


vlowther
2017-11-28 14:59
along with the SSH certificates section of https://ef.gy/hardening-ssh

vlowther
2017-11-28 15:01
tl;dr: SSH supports auth using signed certs that have valid lifetimes instead of the classic public/provate keypairs with unbounded lifetimes

ctrees
2017-11-28 15:49
Thanks!

wdennis
2017-11-28 17:01
@shane Thx for the more precise gohai removal syntax

shane
2017-11-28 17:01
:slightly_smiling_face: no problem ... I fought that for a bit before I realized a dash was "important" to JSON ...

wdennis
2017-11-28 17:02
There's always a "sin tax" :wink:

shane
2017-11-28 17:04
I added it to the FAQ documentation - will be updated next time we push to "latest"

wdennis
2017-11-28 17:04
So, @greg I'm guessing that `CurrentTask` == `0` means "no more tasks"

greg
2017-11-28 17:12
it depends upon the task list length. :slightly_smiling_face:

greg
2017-11-28 17:12
CurrentTask is the index into the task list.

vlowther
2017-11-28 17:13
CurrentTask == 0 means the zeroth task in the list is the current one.

vlowther
2017-11-28 17:13
:slightly_smiling_face:

vlowther
2017-11-28 17:13
CurrentTask == len(Tasks) means nothing else to do.

greg
2017-11-28 17:13
if CT == len(tasklist), then done. Otherwise, it is that that position (programmer style) in the list.

wdennis
2017-11-28 18:34
ah, got it

wdennis
2017-11-28 18:35
I take it then the `Tasks` list persists until the pointer == len(Tasks), then is deleted?

zehicle
2017-11-28 18:52
the UX machines page should have a task list now (with x-links)

vlowther
2017-11-28 19:25
so, : Spaces in names of things: Awesome, ok i guess, or heresy?

vlowther
2017-11-28 19:27
I am adding validation to various names of things,, and it would be good to know before I make everyone's life just that much harder.

ctrees
2017-11-28 19:50
Jobs: "Awesome" Woz: "Heresy, spaces are my HEX delimiter" ... pick your abstraction camp

vlowther
2017-11-28 20:04
So, here are the validations I am contemplating:

vlowther
2017-11-28 20:04
var ( validName = regexp.MustCompile(`^\pL+([- _.]+|\pN+|\pL+)+$`) validParamName = regexp.MustCompile(`^\pL+([- _./]+|\pN+|\pL+)+$`) )

vlowther
2017-11-28 20:05
the former is for everything that is not a param

vlowther
2017-11-28 20:05
the latter is for params for $REASONS

vlowther
2017-11-28 20:06
\pN is everything Unicode considers a number, and \pL is everything Unicode considers a letter

vlowther
2017-11-28 20:06
the rest of it should be obvious enough to anyone who has stared at too much Perl.

vlowther
2017-11-28 20:07
or obsessed about DFA vs NFA ww.r.t speed and feature completeness.

diego.milhomes
2017-11-28 20:19
has joined #json

vlowther
2017-11-28 20:21
@wdennis the Tasks list persists until someone or something changes it

vlowther
2017-11-28 20:21
either directly via the API or indirectly via stage change.

shane
2017-11-28 20:22
@diego.milhomes - welcome

ctrees
2017-11-28 20:47
@vlowther so must start with letter... maybe that's why my "2cld" name had issues

ctrees
2017-11-28 20:49
my regexp foo is weak though...

vlowther
2017-11-28 20:49
well, these are what I am working on right now.

vlowther
2017-11-28 20:50
They are not in a tree you would be using, unless you hafve some crazy access to my laptop I have not noticed. :slightly_smiling_face:

ctrees
2017-11-28 20:51
oh... it was on other 'ansible' or 'vagrant' things... you just got me thinking 'oh... that's probably why'

ctrees
2017-11-28 20:55
i changed to "tocld" name and it worked... didn't dig that deep... I can ping some moz 'data collection log guys' if you need intense regex foo'ness

wdennis
2017-11-28 22:17
@vlowther No to spaces! (and double no to tabs! :stuck_out_tongue_winking_eye: )

vlowther
2017-11-28 22:19
That makes it a tie!

vlowther
2017-11-28 22:20
For now, it is spaces because some of my unit tests have them.

wdennis
2017-11-28 22:21
I need to work on some mods to my preseed file.... Need to create custom one and link it in; how do I go about doing that?

shane
2017-11-29 15:58
@wdennis - pretty easy ```drpcli templates show net-seed.tmpl --format=yaml > /tmp/my_fancy_new-net-seed.yaml vim /tmp/my_fancy_new-net-seed.yaml # change ID to a new template name # change whatever else # leave the late_command stuff in there drpcli templates create -< /tmp/my_fancy_new-net-seed.yaml ``` Now modify a BootEnv to use your new seed instead of the default `net-seed.tmpl`. Basically - clone a BootEnv, and use that as your stage BootEnv for workflow transition.

ctrees
2017-11-29 16:06
speaking of BootEnv ... via the packet docker [root@buildbox ubuntu1604]# save2image ubuntu1604.tar I should be able to use that as a bootenv image directly ?

wdennis
2017-11-29 17:39
@shane Getting this error when I try to create the new template: ```[dradmin@dr-admin ~]$ drpcli templates create - < necla-ubu-seed.yaml Error: Invalid template object: error converting YAML to JSON: yaml: line 143: could not find expected ':' and error converting YAML to JSON: yaml: line 143: could not find expected ':'```

shane
2017-11-29 17:40
can you direct message me your yaml (excise any sensitive bits)...

shane
2017-11-29 17:41
probably just a space format error in the yaml

shane
2017-11-29 17:41
(at around line 143)

wdennis
2017-11-29 17:42
n/m, indentation error from editing... :face_with_rolling_eyes:

shane
2017-11-29 17:43
:slightly_smiling_face:

wdennis
2017-11-29 17:51
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F86NJD11P/bootenv-clone-fail-ux.png and commented: Next issue... getting error when trying to save bootenv cloned in the UX

shane
2017-11-29 18:05
use `drpcli`: ```drpcli bootenvs show ubuntu-16.04-install --format=yaml > my-ubuntu-16.04-install.yaml # modify appropriately - make sure to change "ID" vim my-ubuntu-16.04-install.yaml drpcli bootenvs create -< my-ubuntu-16.04-install.yaml```

shane
2017-11-29 18:06
sorry - I don't use UX, so hadn't run in to that bug - but, I get same thing trying to `clone`

wdennis
2017-11-29 18:15
Thought you guys could use the QA help :stuck_out_tongue_winking_eye:

wdennis
2017-11-29 18:16
(On UX)

shane
2017-11-29 18:16
Since we don't actually have a UX person ... I think it's coming along fabulously ... certainly has rough edges though - no doubt ...

wdennis
2017-11-29 18:18
It?s so beautiful that it makes me want to use it...

shane
2017-11-29 18:19
please do ... and ... please file tickets as you bump in to those sharp edges


shane
2017-11-29 23:37
@wdennis - I submitted a pull request for `provision-content` which adds support for a custom preseed to be defined. This lets the stock provided BootEnvs remain unchanged, and you can simply create a Param of `select-seed` with the value of a new template file with your custom preseed changes - it hasn't been approved yet, and will take a little while to work through the system to release... https://github.com/digitalrebar/provision-content/pull/42

wdennis
2017-11-30 00:46
@shane Sounds great! Also do this for RedHat-family distros?

greg
2017-11-30 00:49
Yeah. We are talking about that.

wdennis
2017-11-30 00:56
That would be great -- the more we can stick with DR-provided content, the better (as far as getting updated content etc.)

wdennis
2017-11-30 00:57
Of course, it's still a benefit sometimes of creating custom objects, but this breaks the community/RackN-provided updates via the content system (custom obj's never get updated)

greg
2017-11-30 02:00
Well. Part of me is wondering what you want to do. Could what you want be handled by tasks? Or by @lae?s changes

lae
2017-11-30 02:33
ah so

lae
2017-11-30 02:33
for custom preseeds I maintain my own, "fireeye-content" package...

lae
2017-11-30 02:33
or well, custom anything really

greg
2017-11-30 02:34
Yeah that was part of the purpose for Content Packs. Glad you are using them.

wdennis
2017-11-30 03:43
@greg My goal is to as little as possible in the preseed, and as much of the work as possible in my configuration mgmt system (Ansible in my case.)

wdennis
2017-11-30 03:44
That being said, I do need to handle some stuff in preseed that needs to happen during OS install, to prep for the CM run

wdennis
2017-11-30 03:44
(Well, disk partitioning too)

wdennis
2017-11-30 03:49
My main deltas from the stock DRP seed file are: - enable root account with password (sadly, company std) as well as key exchange - don?t create initial user (root suffices) - install ?python-minimal? pkg (provides Py2.7, which is needed for Ansible runs)

wdennis
2017-11-30 03:51
Haven?t tried to tackle using differing partitioning schemes, which I believe is enabled via sub-template inclusion

greg
2017-11-30 04:39
in ubuntu right?

greg
2017-11-30 04:40
I would do those as three tasks in a stage i sequence into a workflow.

greg
2017-11-30 04:40
Use the custom partitioning pieces for partition magic.

greg
2017-11-30 04:41
@wdennis- something like this:

greg
2017-11-30 04:41
task1 - rmuser rocketskates in a shell script.

greg
2017-11-30 04:42
task2 - usermod -p <encrypted password> (from parameter) in script

greg
2017-11-30 04:43
task3 - apt-get install -y python-minimal

greg
2017-11-30 04:43
stage customize - task1, task2, task3, change-stage - RunnerWait = true.

greg
2017-11-30 04:44
workflow: ubuntu-16.04-install -> customize -> complete-nowait

greg
2017-12-01 03:51
- Hi All!

greg
2017-12-01 03:52
The basis for v3.4.0 has been committed to tip.

greg
2017-12-01 03:53
This contains the CLI conversion to use the API instead of the swagger generated API code. This will allow for better maintenance, smaller codebase, and golang API.

shane
2017-12-01 03:53
woot! if y'all have an opportunity to deploy `tip` and test, we'd appreciate it !

greg
2017-12-01 03:53
Additionally, the `change-stage` feature has been added.

greg
2017-12-01 03:55
The runner will now always try to change stage and check the map to see what it should do. This will make stages cleaner.

greg
2017-12-01 03:55
This change also restores the ability to not require a workflow to install an os.

greg
2017-12-01 03:56
setting the machines stage to <something>-install will set the stage to local and let the install finish if no workflow entries are found upon completion of the installation.

greg
2017-12-01 03:58
The digital rebar and rackn content have been updated to use this. The plugins as well use this as well.

greg
2017-12-01 04:00
Sooooo ?.. If you upgrade content to latest tip: digital rebar content = v1.2.0-tip-8-a2dd261d1da79c5f42d34728e5bfad570890da86 rackn content = v1.1.0-tip-3-4624c28ae569ee7f0c0ecff62d5ff33c89c75e01 rackn plugins = v1.2.0-tip-3-7da6c05aa74907f07ddc0db168e3128aa7f2b0bd you need to use DRP - v3.3.0-tip-19-ac9f7e4d726a579053cdf247d044036b91ff6a12

greg
2017-12-01 04:04
Existing content should continue to work fine. The one caveat is that if you are using the runner-service. You need to upgrade to the latest task-library for that to work on new installs.

greg
2017-12-01 04:04
We will plan on cutting v3.4.0 on Monday heading into the community meeting on Tuesday. :slightly_smiling_face:

zehicle
2017-12-01 04:15
great notes (and progress!)

zehicle
2017-12-01 04:15
it's worth noting that the CLI change included a lot of rework/improvement in the test patterns.

zehicle
2017-12-04 00:42
@wdennis patch for your template bug is in, please test when you can

zehicle
2017-12-04 00:42
if it works, please note in the defect.

greg
2017-12-04 22:48
- Hi all, v3.4.0 is out and content updates to v1.3.0. Check out the release notes here: https://github.com/digitalrebar/provision/releases/tag/v3.4.0

lae
2017-12-04 23:38
Any ideas what might cause this? (just attempted an upgrade) ``` Dec 04 23:32:52 labs-provision dr-provision[30920]: dr-provision2017/12/04 23:32:52.762604 Version: v3.4.0-0-3af10535d31b6367778d34446d3092ed543aa059 Dec 04 23:32:52 labs-provision dr-provision[30920]: dr-provision2017/12/04 23:32:52.762662 Extracting Default Assets Dec 04 23:32:53 labs-provision dr-provision[30920]: panic: assignment to entry in nil map Dec 04 23:32:53 labs-provision dr-provision[30920]: goroutine 1 [running]: Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/models.(*MetaData).ClearFeatures(...) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/models/meta.go:23 Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/backend.(*Machine).BeforeSave(0xc4202b5340, 0xcc9548, 0xc4202b5340) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/backend/machines.go:605 +0x10a Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/backend.(*Machine).OnLoad(0xc4202b5340, 0x0, 0x0) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/backend/machines.go:647 +0x135 Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store.load(0x10d3e80, 0xc4200f74d0, 0x10cc680, 0xc4202b5340, 0xc42005850c, 0x24, 0x1, 0xb9b300, 0xc4200e9c00, 0x7f1a6f0c7000) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store/keySaver.go:114 +0xf1 Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store.List(0x10d3e80, 0xc4200f74d0, 0x10cc680, 0xc420010480, 0x10cc680, 0xc420010480, 0x15, 0x0, 0x0) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store/keySaver.go:129 +0x174 Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/backend.(*DataTracker).rebuildCache(0xc4211fc0e0, 0xc9e2dd, 0x4) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/backend/dataTracker.go:548 +0x2d9 Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/backend.NewDataTracker(0x10d3a00, 0xc4211fc000, 0xc420252140, 0x1e, 0xc420252180, 0x1e, 0x7ffdb0c0df3d, 0xc, 0xc420016400, 0x1f9b, ...) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/backend/dataTracker.go:679 +0x67d Dec 04 23:32:53 labs-provision dr-provision[30920]: http://github.com/digitalrebar/provision/server.Server(0x188c6a0) Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/server/server.go:152 +0x1128 Dec 04 23:32:53 labs-provision dr-provision[30920]: main.main() Dec 04 23:32:53 labs-provision dr-provision[30920]: /home/travis/gopath/src/github.com/digitalrebar/provision/cmds/dr-provision/dr-provision.go:26 +0x97 ```

lae
2017-12-04 23:41
I guess one of the machine definitions is causing this

lae
2017-12-04 23:44
or...all of them

shane
2017-12-04 23:46
- the v006 meetup tomorrow (11am PST) agenda is posted at: https://docs.google.com/document/d/1PrdLhpR_AbPEahjRC4AoDvR3NPcvKrCYb95leOYy-Iw Cool demo on Immutable Kubernetes via the new "KRIB" (Kubernetes Rebar/RackN Immutable Bootstrapping"

lae
2017-12-04 23:48
``` {"Validated":false,"Available":false,"Errors":[],"ReadOnly":false,"Meta":null,"Name":"some.example.machine","Description":"","Uuid":"e313c00d-0f9f-4bd2-913b-4e0f78baaf9a","CurrentJob":"","Address":"1.1.1.1","Stage":"none","BootEnv":"local","Profiles":["some_profile"],"Profile":{"Validated":false,"Available":false,"Errors":null,"ReadOnly":false,"Meta":null,"Name":"","Description":"","Params":null},"Tasks":[],"CurrentTask":0,"Runnable":false,"Secret":"0SXxivTmio-jB00w"} ``` here's an example definition in `/var/lib/dr-provision/digitalrebar/machines` that's causing DRP not to start

shane
2017-12-04 23:48
Cool demo on Immutable Kubernetes via the new "KRIB" (Kubernetes Rebar/RackN Immutable Bootstrapping"

shane
2017-12-04 23:49
@lae - I'm guessing you're trying to update to the newly released v3.4? There were some structural changes, and it's possible that some of the content needs changed to match ....

lae
2017-12-04 23:50
yeah


lae
2017-12-04 23:50
Oh, i was looking at "stable" doc

lae
2017-12-04 23:52
actually never mind I don't see any 3.2 to 3.4 specific instructions on RTD

lae
2017-12-04 23:54
DRP was still starting with existing content, probably since I was following tip for a bit (though I didn't check to see if they were functional). It's just the machine definitions that seem to be causing an issue and I don't quite see anything in the release notes about that

shane
2017-12-04 23:55
going to need @greg and/or @vlowther to wade in on that

greg
2017-12-05 01:22
@lae - what is the error?

greg
2017-12-05 01:22
nvm - read farther up

greg
2017-12-05 01:23
I?ll fix it.

greg
2017-12-05 01:23
You machines have `null` meta data.

greg
2017-12-05 01:23
I?ll fix it.

greg
2017-12-05 01:23
We assume something.

shane
2017-12-05 01:24
I suppose that's better than assuming everything ....

shane
2017-12-05 01:24
... or nothing ?

greg
2017-12-05 01:27
With 3.3.0, objects populate all fields all the time. For the most part, it is okay. I?ll fix the spot that isn?t.

nguyenhappy92
2017-12-05 01:55
has joined #json

nguyenhappy92
2017-12-05 01:57
I have install with guide at http://provision.readthedocs.io/en/latest/doc/quickstart.html but I have met appear error after install command line curl -fsSL get.rebar.digital/stable | bash -s -- install --isolated then run next command line ./drpcli bootenvs uploadiso centos-7-install


nguyenhappy92
2017-12-05 01:57
Please give me advice and resolve this issue. Thank you !

shane
2017-12-05 01:57
@nguyenhappy92 welcome to the Digital Rebar Provision (DRP) #community

shane
2017-12-05 01:58
Please insure you have started your DRP Endpoint service on the host first. After the initial quickstart install, you must still start the service up

shane
2017-12-05 01:59
you can verify if the service is running on that host, with: `ps -ef | grep -v grep | grep dr-provision`

shane
2017-12-05 02:00
the appropriate start command should have been sent to your shell/terminal when the installer ran - but it should be something like: ```sudo ./dr-provision --static-ip=<SOME IP ADDRESS> --file-root=`pwd`/drp-data/tftpboot --data-root=`pwd`/drp-data/digitalrebar --local-store="" --default-store=""``` When run from the directory that you performed the install in

shane
2017-12-05 02:01
the `<SOME IP ADDRESS>` portion should be an IP address on the node that is running `dr-provision` - and is the network that you will be doing provisioning activities on

nguyenhappy92
2017-12-05 02:01
It's running but command line ./drpcli bootenvs uploadiso ubuntu-16.04-install or ./drpcli bootenvs uploadiso centos-7-install Error: GET: bootenvs/ubuntu-16.04-install: Not Found

shane
2017-12-05 02:01
if you have a single interface dr-provision service - then you can safely leave that out of the command line

shane
2017-12-05 02:02
can you please provide the output of: `./drpcli bootenvs show ubuntu-16.04-install` ??

nguyenhappy92
2017-12-05 02:09
Error: GET: bootenvs/ubuntu-16.04-install: Not Found

shane
2017-12-05 02:09
do you have the exact command you used to install with ?

shane
2017-12-05 02:10
specifically - did you use the `--no-content` flag ?

nguyenhappy92
2017-12-05 02:12
i don't choice --no-content flag. I have used command line curl -fsSL get.rebar.digital/stable | bash -s -- install --isolated. Can you give me command to install this.

shane
2017-12-05 02:15
can you please provide output of: `./drpcli contents list | jq '.[].meta.Name'` (you need the `jq` tool installed on the endpoint)

nguyenhappy92
2017-12-05 02:23
"BackingStore" "LocalStore" "DefaultStore" "BasicStore"

shane
2017-12-05 02:36
@nguyenhappy92 - you are missing the "content" that includes the BootEnvs for some reason

shane
2017-12-05 02:36
have you used the Web UX yet with your endpoint ?

shane
2017-12-05 02:37
the easiest way to get "content" installed is via the UX, via the "contents" panel

shane
2017-12-05 02:38
if you visit: `https://<IP_ADDRESS_OF_YOUR_DRP_ENDPOINT>:8092`

nguyenhappy92
2017-12-05 02:38

shane
2017-12-05 02:38
that is port 8091 - can you please visit HTTPS port 8092 for the same IP addr

nguyenhappy92
2017-12-05 02:38
Above present website

shane
2017-12-05 02:39
the web service on port 8092 is the TFTP directory contents

shane
2017-12-05 02:40
port 8092 is the API - and will also redirect a connection (for HTTPS) to the RackN Portal

nguyenhappy92
2017-12-05 02:40

shane
2017-12-05 02:40
log in to your endpoint with the user/pass (use defaults if you didn't change them)

nguyenhappy92
2017-12-05 02:40
yeah

shane
2017-12-05 02:40
nice

shane
2017-12-05 02:40
now go to `Contents` in lower left corner

shane
2017-12-05 02:41
(sorry - `Content Packages`)

shane
2017-12-05 02:41
you should see `drp-community-content` in the right side panel - click on the `Transfer` button for that content pack

nguyenhappy92
2017-12-05 02:43
I have transfer

shane
2017-12-05 02:44
you should be able to do the "uploadiso" commands from the command line now

shane
2017-12-05 02:44
for some reason the `drp-community-content` was missing - this includes the BootEnvs with the Operating System installation content necessary to be able to install CentOS/Ubuntu systems

shane
2017-12-05 02:45
usually this is installed by default during the installation phase, as long as the `--no-content` option is NOT specified

nguyenhappy92
2017-12-05 02:45
oh

nguyenhappy92
2017-12-05 02:46
Now I will finish other configuration

nguyenhappy92
2017-12-05 02:46
thank you.

shane
2017-12-05 03:00
you're welcome

greg
2017-12-05 03:45
@lae - fix going through build tests now.

greg
2017-12-05 04:22
builds are all cranking. I?ll let the channel now when they are finished.

greg
2017-12-05 04:51
- 3.4.1 is release. Should address @lae?s issue.

nguyenhappy92
2017-12-05 06:39
Hi all,

nguyenhappy92
2017-12-05 06:39
why run command line drpcli machines list not found

nguyenhappy92
2017-12-05 06:40

nguyenhappy92
2017-12-05 06:40
can you give me solve this case

nguyenhappy92
2017-12-05 06:40
thank you

nguyenhappy92
2017-12-05 06:41

greg
2017-12-05 15:01
@nguyenhappy92 - have you PXE booted a machine against the DRP instance? Where are your machines coming from? Are they physical? Are they virtual? Where is DRP running? have you configured a subnet?

lae
2017-12-05 15:32
@greg thank you, just pushed 3.4.1 and it's functioning

greg
2017-12-05 15:32
Cool! Thanks for testing it.

2017-12-05 15:54
trying to run digitalrebar latest stable ... for some reason dr_pr<ovisioner:master> keeps restarting

2017-12-05 15:54
ever seen that ?

greg
2017-12-05 15:54
Yes - it usually means I didn?t handle some start option. If possible check the logs and see what it is complaining about.

greg
2017-12-05 15:55
Make sure you are v3.4.1

shane
2017-12-05 15:55
you can check for version via `dr-provision --version`

2017-12-05 15:59
and where are the logs situated ? dummy question :-)

vlowther
2017-12-05 16:00
How are you running it?

2017-12-05 16:01
under Ubuntu with all the docker containers running

vlowther
2017-12-05 16:01
That sounds like DRv2

vlowther
2017-12-05 16:01
You probably don't want that.

shane
2017-12-05 16:01
Hmmm ... smells like time for an upgrade ... :slightly_smiling_face:

2017-12-05 16:02
well I cloned GitHub.com/digitalrebar

2017-12-05 16:02
and installed as per documentation ./run-in-system ... yadi-yadi-yada


2017-12-05 16:04
Either way, the logs for DRv2 are accessible via docker-compose

2017-12-05 16:04
cd into deploy/compose and run docker-compose logs -f dr-provision to see them.

2017-12-05 16:05
However, we are deprecating digitalrebar/digitalrebar in favor of digitalrebar/provision.

shane
2017-12-05 16:05
...and you are playing with Digital Rebar ver2 - and it's old - the new goodness is Digital Rebar Provision ver3 (DRPv3)

2017-12-05 16:05
The usual reason for the dr-provision container to not start involve not being able to grab the ports it needs

shane
2017-12-05 16:06
if you do choose to upgrade to DRPv3 - see our quickstart quide to get started: http://provision.readthedocs.io/en/latest/doc/quickstart.html

2017-12-05 16:07
yeah reason I wanted to run digitalrebar was for IPMI management and local ux

2017-12-05 16:09
if I pull docker file from dockers-hub on provision ... is it DRPv3 or older ?

greg
2017-12-05 16:09
Well - DRP has IPMI management in a couple of forms and the UX can be run locally, but you should talk to us about that use case.

greg
2017-12-05 16:09
DRP is a completely different deployment structure.

greg
2017-12-05 16:10
DRP is a single go binary.

2017-12-05 16:10
They are entirely seperate products

2017-12-05 16:11
ok will try DRPv3 in 2 mins

shane
2017-12-05 16:14
please make sure you do not have any IPtables rules blocking ports 67, 69, 8091, and 8092 - if you re-use the node that you tried to install DRv2 on ...

lae
2017-12-05 16:14
(yeah the single go binary is my number one reason why I heavily prefer DRP over the previous DRv2 :sweat_smile: )

greg
2017-12-05 16:14
@lae - were you using DRv2?

lae
2017-12-05 16:14
attempting to

lae
2017-12-05 16:14
lol

lae
2017-12-05 16:15
shortly before DRP was published?

lae
2017-12-05 16:15
so I guess my timing was apt

lae
2017-12-05 16:16
I got DRv2 to work for a bit but I recall having some issues where I couldn't exactly replace my existing setup with Pixiecore and an unmaintained project called Waitron and ultimately gave up

greg
2017-12-05 16:16
make sense

2017-12-05 17:04
up and running

2017-12-05 17:05
did latest stable install

2017-12-05 17:05
dr-provision2017/12/05 17:04:50.218192 Version: v3.4.1-0-8d49a776c3d7b40d2af07a356e7b33d2e2b99ca2

greg
2017-12-05 17:06
Great!

zehicle
2017-12-05 18:26
"up and running" < that should be the DRP slogan

zehicle
2017-12-05 18:26
not very creative tho

shane
2017-12-05 18:49
- Digital Rebar meetup v006 starts in just over 10 mins .... hope you can join us, via the Zoom link: https://zoom.us/j/3403934274

daniel.bernier
2017-12-05 20:34
has joined #json

shane
2017-12-05 21:35
the v006 meetup video is posted now - if you missed meetup and curious to learn about our Immutable Kubernetes solution ... etc... check it out: https://youtu.be/Z4jjN1wCtCM

daniel.bernier
2017-12-05 22:10
Hi any reason why I can?t run ../drpcli from docker exec ?

daniel.bernier
2017-12-05 22:11
docker exec -ti hungry_curie ?./drpcli bootenvs uploadiso sledgehammer? oci runtime error: exec failed: container_linux.go:265: starting container process caused ?exec: \?./drpcli bootenvs uploadiso sledgehammer\?: stat ./drpcli bootenvs uploadiso sledgehammer: no such file or directory?

daniel.bernier
2017-12-05 22:13
forget it

daniel.bernier
2017-12-05 22:13
works

schwartz.kylej
2017-12-06 00:48
has joined #json

vlowther
2017-12-06 13:47
Yeah, no containers for drp.

vlowther
2017-12-06 13:48
For drv2, we settled in containers as a delivery mechanism and as a basis for eventual distributed/ha work.

vlowther
2017-12-06 13:49
We decided that the additional overhead that required was too much work for drp, and that we should aim for a single binary deploy mechanism

vlowther
2017-12-06 13:52
In drv2 we were originally a rails + other things stack, and containers were a great way to provide the operating environment we wanted.

vlowther
2017-12-06 13:53
When we started refactoring parts out into smaller services written in go, that specialized environment aspect was less relevant.

vlowther
2017-12-06 13:56
When we decided to split out the provisioning bits of drv2 into their own product, Greg and I had enough expertise in go to be able to combine the bits we needed into a single binary that embedded everything we needed to bring the service up.

vlowther
2017-12-06 13:57
Which made the requirement we used to have for containers (and some of the UDP related issues they caused) to go away.

vlowther
2017-12-06 13:59
Also hi from kubecon.:grinning:

zehicle
2017-12-06 14:22
Note: There are users who package DRP in containers. It can work fine. It adds extra complexity for new users that can trip them up.

shane
2017-12-06 14:27
@schwartz.kylej welcome

daniel.bernier
2017-12-06 15:31
thanks guys ? have v3.4.1 running in docker

zehicle
2017-12-06 16:08
I think there's a dockerfile in the project @daniel.bernier - maybe too late to help

zehicle
2017-12-06 16:09
if you missed the community meetup -> there's a critical discussion half way in about atomic updates that's worth listening to

daniel.bernier
2017-12-06 16:13
will listen to it later tonight

daniel.bernier
2017-12-06 16:13
yes there is one, I forked it to run on stable instead of tiop

daniel.bernier
2017-12-06 16:13
tip

wdennis
2017-12-06 17:36
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F8AJ2U1UK/unknown-endpoint-notice.png and commented: Just upgraded to v3.4.1 - Any reason my endpoint is showing as "unknown"?

greg
2017-12-06 17:37
May not have updated the SaaS data database for that version.

wdennis
2017-12-06 17:38
How to check?

greg
2017-12-06 17:38
that is an our side problem. It is fine. The Saas database doesn?t know to say that it is latest.

greg
2017-12-06 17:38
It is fine. Trust me. :slightly_smiling_face:

wdennis
2017-12-06 17:40
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F8AMLGQ76/screen_shot_2015-05-12_at_3.31.31_pm.png and commented: Oh I trust you, @greg :wink:

zehicle
2017-12-06 18:01
I did the 3.4.0. Missed the update

greg
2017-12-06 18:02
I need to learn it.

daniel.bernier
2017-12-07 15:18
is there a way to comment reservations ? ? using MAC as a UID is quite not trivial

greg
2017-12-07 15:35
Currently, the reservation object doesn?t have a description or notes field.

greg
2017-12-07 15:36
@daniel.bernier - it does have a Meta struct that you could abuse, but it isn?t displayed in the UI.

greg
2017-12-07 15:36
This is a good feature request.

daniel.bernier
2017-12-07 15:36
ah great

daniel.bernier
2017-12-07 15:36
saw it through drpcli

greg
2017-12-07 15:37
I have a PR that will make the Meta field easily editable. It is a little hard to manipulate in stable.

greg
2017-12-07 15:37
Though a description field is easy to add and non-breaking to the API.

greg
2017-12-07 15:39
`drpcli reservations update <IP> '{ "Meta": { "desc": "my notes" } }'`

greg
2017-12-07 15:39
But I think that will only work once my tip changes make it in.

greg
2017-12-07 15:40
Now, another way to manipulate reservations (from config as code perspective) is to make a content bundle.

greg
2017-12-07 15:40
You can then set meta in that.

greg
2017-12-07 15:42
``` - Addr: 192.168.1.1 Meta: Description: Fred Rules Strategy: MAC Token: aa:bb:cc:dd:ee:ff ```

greg
2017-12-07 15:42
those a little more complex topics.

greg
2017-12-07 15:43
Maybe we should write up a `build your content bundle` doc

shane
2017-12-07 15:45
@greg - I have that in my list of doc thingz to tackle

daniel.bernier
2017-12-07 17:51
ok other newbie question for you all

daniel.bernier
2017-12-07 17:52
discovery with sledgehammer keeps in failed mode and I have no job-logs

daniel.bernier
2017-12-07 17:54
found that the job discovery start a gohai task and that task refers to a template which does not exist it seems

zehicle
2017-12-07 18:54
@daniel.bernier sounds like a bug or missed components some where - all of RackN is at KubeCon, so responses will be slower in community channel

daniel.bernier
2017-12-07 18:54
no prob

daniel.bernier
2017-12-07 18:54
find a way around it

greg
2017-12-07 19:34
@daniel.bernier when you can, what was the issue and workaround?

daniel.bernier
2017-12-07 19:41
I just bypassed gohai

daniel.bernier
2017-12-07 19:41
which was not a clean fix

wdennis
2017-12-07 23:28
Status: loving v3.4.1 :slightly_smiling_face:

wdennis
2017-12-07 23:28
Now I just gotta get that custom ks/preseed goodness

shane
2017-12-07 23:34
Hopefully we'll have that for you next week....

ctrees
2017-12-08 13:47
So any kubecon 2017 reports ? I'd especially like to hear about storage ideas :wink:

greg
2017-12-08 13:50
Wouldn?t we all

greg
2017-12-08 13:51
:grinning:

greg
2017-12-08 13:52
Haven?t seen. But. Persistent claim controllers and bridges to storage but not much details

daniel.bernier
2017-12-08 15:06
ok is there a repository for exemples such has part-scheme jsons, etc ?

greg
2017-12-08 15:38
Not really. We have the default one. We may build up more overtime. We are open to adding more.

daniel.bernier
2017-12-08 18:49
ok other bug

daniel.bernier
2017-12-08 18:55
if I clone ubuntu-16.04-install bootenvs I lose the InitRds value

daniel.bernier
2017-12-08 18:55
through UI even in Edit Mode I cannot define

daniel.bernier
2017-12-08 18:56
through DRPCLI I get the following error

daniel.bernier
2017-12-08 18:56
provision # ./drpcli bootenvs update ?etg-ubuntu-16.04-install? {?Initrds?:?install/netboot/ubuntu-installer/amd64/initrd.gz?} Error: Failed to generate changed bootenvs:etg-ubuntu-16.04-install object: invalid character ?I? looking for beginning of object key string

greg
2017-12-08 19:40
Make sure you use a single quote around the Json blob.

daniel.bernier
2017-12-08 20:28
got this instead Error: Failed to generate changed bootenvs:etg-ubuntu-16.04-install object: json: cannot unmarshal string into Go struct field BootEnv.Initrds of type []string

greg
2017-12-08 20:35
Make sure the initrds is a list

greg
2017-12-08 20:36
@daniel.bernier

greg
2017-12-09 19:32
- Hi All, the tip of the trees have been updated with all the stuff we showed and talked about at KubeCon. Also a couple of fixes in tip for gohai and some others. If you pick up *tip* content, you must use a *tip* DRP.

i.grischott
2017-12-09 19:55
thanks.

daniel.bernier
2017-12-09 20:10
Hi how can I debug a bootenv install ?

i.grischott
2017-12-09 21:43
what is wrong if i uploaded the iso..then


nkabir
2017-12-10 18:41
has joined #json

zehicle
2017-12-10 20:12
Welcome @nkabir

zehicle
2017-12-10 20:20
@i.grischott try restarting the service to see if it finds the iso. Most likely, the bootenv has a different iso reference

greg
2017-12-10 21:20
@i.grischott - the last error indicates that explode iso failed because the debian-9 iso you added doesn?t match the expected iso. This can happen if the iso failed to download cleanly, didn?t use the correct one the first time, or corrupt on upload. Debian is a little different because you have to download from the debian location and rename the file on upload.

i.grischott
2017-12-10 22:22
i use this link


i.grischott
2017-12-10 22:24
direct download to Boot ISOs would be nice..

i.grischott
2017-12-10 22:27
i would upload Container Linux ISO for Boot ISO and implement them for deploying on Bare Metal..

i.grischott
2017-12-10 22:41
i ran the dr-provision in a docker container.. is there an option to run it in silent mode? i don't need to see the output everytime..

greg
2017-12-11 00:05
@i.grischott - the link you used in the picture is for debian 8, but your reference debian 9.

greg
2017-12-11 00:05
@i.grischott - CoreOS is possible and I have done it before, but it is a little tricky in certain ways.

greg
2017-12-11 00:06
@i.grischott - The primary noise is for API calls. You can turn the API logging to none in the system prefs page of the UI.

greg
2017-12-11 00:50
@i.grischott - the link you used in the picture is for debian 8, but your reference debian 9.

i.grischott
2017-12-11 13:10
hi, thanks. i tried with debian 8 and 9 the same.. i downloaded from:


i.grischott
2017-12-11 13:10
or



i.grischott
2017-12-11 13:11
iso content:


i.grischott
2017-12-11 13:12
there is no subdir install/linux


wdennis
2017-12-11 13:15
Don?t you have to rename the Debian mini.iso before exploding it? (Downloads as just ?mini.iso?, but DRP expects it to be named ?debian-[8,9]-amd64-mini.iso?)

wdennis
2017-12-11 13:17
Not sure if downloading it thru the UX does the renaming or not

wdennis
2017-12-11 13:21
After renaming, have to HUP the dr-provision process to get it to inspect and re-process the isos

i.grischott
2017-12-11 13:27
i restarted the dr-provision server.. i tested with mini.iso and tested after renaming..

i.grischott
2017-12-11 13:42
i ran the provisioner server on container.. like this:


i.grischott
2017-12-11 14:09
The Exploding ISO for Debian search for "install" directory but there is no install-directory on the ISO.



i.grischott
2017-12-11 14:14
on other mirror there is a ISO with the subdir install.amd with the files initrd.gz, vmlinux...

greg
2017-12-11 14:19
It will explode it into install

greg
2017-12-11 15:04
@i.grischott - What did you use the UX for all of this?

greg
2017-12-11 15:20
Okay - @i.grischott - there is a drp bug when using the iso upload feature. It doesn?t use the name from the UX. It uses the filename from the filesystem. I?m fixing that.

greg
2017-12-11 15:21
Additionally, it appears that debian images have been updated that the checksum don?t match anymore. I?ll fix that as well.

i.grischott
2017-12-11 15:24
:+1:

ctrees
2017-12-11 15:49
Sort of off-subject, but the 'rackn' crew was doing heavy bind (and ansible)... is there a ansible-galaxy 'author' or 'package' you'd use for BIND ?

ctrees
2017-12-11 15:50
or better yet... is RackN pushing ansible playbook stuff back to galaxy :wink:

greg
2017-12-11 15:52
We rolled our own in a container in the DRv2 code base. We also had a go-binary that managed managed to give us a RESTFul API that could config DNS.

greg
2017-12-11 15:52
The go-binary could talk nsupdate to raw things, POWER DNS, and local bind config.

greg
2017-12-11 15:52
The container just had bind and the go-binary. We didn?t really role ansible much for that.

ctrees
2017-12-11 16:10
ok... yea I was looking for patterns for naming and service discovery... I'll look into the k8s patterns... kubespray must have to deal with it somehow

ctrees
2017-12-11 16:11
but thanks... I didn't know of POWER DNS

greg
2017-12-11 16:14
@i.grischott - I?ve update tip content to fix the checksum to match the latest. Update drp-community-content to tip should fix the checksum part.

greg
2017-12-11 16:17
The DRP fix for using the name and not the browser?s filesystem nam is trickling through the system. Hopefully within the next hour or so it will complete.

shane
2017-12-11 19:14
- if you're interested in Kubernetes ... and Digital Rebar (!!) you might like to check out our Webinar we'll be running on Immutable Kubernetes with Digital Rebar Provision: http://bit.ly/2BCpFGk

zehicle
2017-12-12 22:23
If you missed the 5-minute Kubernetes install we did at Kubecon, here's a recap of the demo https://youtu.be/OMm6Oz1NF6I

zehicle
2017-12-12 22:23
we'll go into more detail on Thursday

zehicle
2017-12-13 05:49
Small update for the UX - we added dynamic updates to all the list views today. If things are running behind the scenes, you will see live changes as they happen on the lists. This is very helpful on the machines & bulk edit screens.

ctrees
2017-12-13 14:54
So, THAT is the ssh less method ? (aka all the config is in the local node setup via DRP.... THEN it's handed over to kubctl... (look ma, no node ssh !)

ctrees
2017-12-13 14:57
so all machine control coms is through kubectl via the proxy you setup, and if you want to pull a node out, you use the DRP IPMI ?? to force PXE reboot at which time sledge can setup a new local config ??

greg
2017-12-13 14:59
yes - though - you could build a new workflow, that drained the node, and offlined it, and then rebooted it back to sledgehammer.

ctrees
2017-12-13 15:09
wait... in the workflow that 'drained the node' how would that work ? seems like you'd have to talk to kubectl ? so DRP would need to know of the gateway that had the kubectl (or was that why rob started the local kube proxy)...

ctrees
2017-12-13 15:10
I should say in your new workflow...

greg
2017-12-13 15:10
Well, the local kubectl proxy is so that he could access the UI.

greg
2017-12-13 15:11
I?d have to check what powers the kubelet cert conf file has. I could see two workflows styles being built.

ctrees
2017-12-13 15:11
Oh... right... in your thought example you would need DRP to know about that mech ?? correct ?? t

greg
2017-12-13 15:12
1. single node workflow that kubelet cert has enough to run kubectl on itself to drain, mark offline, and then reboot. that is really easy.

greg
2017-12-13 15:12
actually, nvm. This is ?easy?.

greg
2017-12-13 15:12
Workflow is this:

greg
2017-12-13 15:12
create a task call `drain-me`.

greg
2017-12-13 15:12
as a template.

greg
2017-12-13 15:13
Use the Profile Token expansion to get a token to read the admin conf creds. Download kbuectl if not already present.

greg
2017-12-13 15:13
call kubectl drain (admin conf has ip of admin in it).

greg
2017-12-13 15:13
call kubectl off line

greg
2017-12-13 15:14
both with admin creds pulled from DRP Profile (with limited access token).

greg
2017-12-13 15:14
That that task and create a stage, ?decommision-k8s-node? and put the drain me task on it.

greg
2017-12-13 15:15
Then create a workflow `decommision-k8s-node -> discover:Reboot`

greg
2017-12-13 15:15
When you are done with a node, set the stage to `decommission-k8s-node`.

greg
2017-12-13 15:15
The node wil drain, offline, and reboot back into discovery.

greg
2017-12-13 15:16
Or you could set the stage to `mount-local-disks` if you wanted to readd it directly to the cluster.

greg
2017-12-13 15:16
The decommission set could also remove the profile if you really wanted to clean it.

greg
2017-12-13 15:17
So, you could update k8s this way, I think, as well.

greg
2017-12-13 15:17
The master is a little sketchy in this .

ctrees
2017-12-13 15:18
aw... the 'key' to my understanding is "call kubectl drain (admin conf has ip of admin in it)."

greg
2017-12-13 15:19
So, the profile that is the shared write space for the cluster has the admin.conf file in it.

ctrees
2017-12-13 15:19
basically in my head, I was attempting to follow the command protocals that are talking when your not ssh'n to a local user....

greg
2017-12-13 15:19
well - this wouldn?t require ssh.

greg
2017-12-13 15:19
It requires that the DRP runner is still running.

greg
2017-12-13 15:19
Then the task execution will run if something shows up

ctrees
2017-12-13 15:22
Awh... the DRP runner is running as local auth on the node... which was what rob mentioned but I was not sure how...

ctrees
2017-12-13 15:23
and I get you also have the flexibility to NOT do that :wink:

greg
2017-12-13 15:23
Yes, but you lose take-over actions (other than reboot back to sledgehammer).

greg
2017-12-13 15:24
For the ram-only case, sledgehammer-wait is still running the runner.

greg
2017-12-13 15:24
for the centos-install case, the runner-server stage as the runner as a systemd service that runs on startup in the follow on boot and is left running as a consequence of the complete stage.

vlowther
2017-12-13 18:43
Coming Soon to a tip DRP near you: port aliveness and availability checking with drpcli info status

vlowther
2017-12-13 18:46
Soonish: runtime stats via event stream.

vlowther
2017-12-14 21:39
So, next cool in-progress thing: the ability to run DRP without needing to upload ISO images or sledgehammer

vlowther
2017-12-14 21:40
I am adding the ability for DRP to transparently proxy the HTTP and TFTP requests needed to PXE boot systems to user-defined remote URLs.

vlowther
2017-12-14 21:41
This piggy-backs on top of the package-repositories support I added a month or so ago.

vlowther
2017-12-14 21:43
As long as the kernel and initrd files are present in the remote repo in the relative locations defined by the bootenv, DRP will be able to act as an intermediate proxy for all incoming TFTP and HTTP requests for these files.

vlowther
2017-12-14 22:18

zehicle
2017-12-14 23:05
@vlowther so could all the packet DRPs share a sledgehammer then?

vlowther
2017-12-14 23:06
Yep.

ctrees
2017-12-15 14:08
So the Endpoint can run on a pie now :wink:

greg
2017-12-15 14:10
Could for a while. :slightly_smiling_face:

greg
2017-12-15 14:11
Just in time for the Holiday Season

marc.heckmann
2017-12-15 16:20
has joined #json

florent.wagener
2017-12-15 16:24
has joined #json

marc.heckmann
2017-12-15 16:27
Hello. I'm just starting to play with DR provision and I have a quick question: I've got everything up and running correctly (easy :thumbsup: ), but now I would simply like to start testing by adding a new bootenv. I can't seem to make that happen. Either through the UX, the CLI or the Swagger UI. I keep getting errors. I'd like to add a simple CoreOS bootenv, but I have a hard time wrapping my head around the Templates section of the bootenv and the relationship between Template in bootenvs and separate Template section.

shane
2017-12-15 16:27
@marc.heckmann and @florent.wagener welcome

marc.heckmann
2017-12-15 16:27
thanks.

florent.wagener
2017-12-15 16:28
@shane thanks :slightly_smiling_face:

shane
2017-12-15 16:28
have you used the UX yet ?

marc.heckmann
2017-12-15 16:28
yes.

shane
2017-12-15 16:28
it makes "following" the flow easier - as you can "expand" or "click through" to each of the sub-parts

shane
2017-12-15 16:29
start w/ BootEnvs - example of centos or ubuntu

marc.heckmann
2017-12-15 16:29
But I keep getting an error about Template ID when I clone -> edit a new Bootenv

shane
2017-12-15 16:29
hmmm ... the UX is still in "Beta" - and has some rough edges - but I thought we had ironed out that one recently

florent.wagener
2017-12-15 16:30
```Error Templates[3]: No common template for Template ID``` is what @marc.heckmann is talking about

marc.heckmann
2017-12-15 16:30
Yes and some others too. A bunch of stuff around the templates.

florent.wagener
2017-12-15 16:30
I have the same issue trying to clone the default discovery bootenv.

florent.wagener
2017-12-15 16:30
(we work together btw)

shane
2017-12-15 16:31
from the CLI - you can do similar actions: ```drpcli bootenvs list drpcli bootenvs list | jq '.[].Name' dprcli bootenvs show centos-7-install dprcli bootenvs show centos-7-install > example-bootenv.json # vim example-bootenv.json drpcli bootenvs create - < example-bootenv.json```

marc.heckmann
2017-12-15 16:31
I guess more generally there is a bunch of stuff that isn't clear to me about the role of templates in the bootenv vs the other common templates.

marc.heckmann
2017-12-15 16:31
+ how the chaining of pxelinux/elilo/ipxe is supposed to work

marc.heckmann
2017-12-15 16:32
If I look at the standard bootenvs that are shipped, they seem to define templates in them and how is that different from the other "common" templates?

marc.heckmann
2017-12-15 16:34
I will try your suggested drpcli flow


marc.heckmann
2017-12-15 16:35
I read through it, yes.

marc.heckmann
2017-12-15 16:36
This line is pretty clear: "The templates can be in-line in the BootEnv object or reference a Template. "

marc.heckmann
2017-12-15 16:36
but I can't get the referencing to work

greg
2017-12-15 16:36
template must exist first.

marc.heckmann
2017-12-15 16:37
it does

greg
2017-12-15 16:37
ok - just making sure.

marc.heckmann
2017-12-15 16:37
Another question: Why do the elilo/pxelinux/ipxe templates seem to be required?

marc.heckmann
2017-12-15 16:38
How are they chained to together in the flow.

greg
2017-12-15 16:41
I can describe it. It will be 30 minutes or so before I can get to it.

marc.heckmann
2017-12-15 16:42
ok. NP :slightly_smiling_face:

vlowther
2017-12-15 16:49
The elilo/pxelinux/ipxe templates are required because we need to be able to render them on a per-machine basis to have machines boot into the "right" boot environment with the proper parameters.

vlowther
2017-12-15 16:50
The ones we provide should do the right thing in thye majority of cases.

marc.heckmann
2017-12-15 16:51
ok. But in the flow what is the actual filename sent by TFTP in the PXE flow: pxelinux?

marc.heckmann
2017-12-15 16:51
I'm assuming so

vlowther
2017-12-15 16:52
To include a template in-line on a bootenv, leave the ID field blank and provide the template to be expanded inline in the Contents field

vlowther
2017-12-15 16:52
To use a shared template, have the ID field be the same as the ID of the shared template, and leave Contents blank

vlowther
2017-12-15 16:53
It is an error tp leave both blank or have both filled out -- you will get an explanatory error if you do that.

marc.heckmann
2017-12-15 16:53
Why have both possibilities though? Why not just use a reference?

marc.heckmann
2017-12-15 16:54
Sorry, I guess it's a little confusing for a first time user :slightly_smiling_face:

marc.heckmann
2017-12-15 16:54
But I get it now

vlowther
2017-12-15 16:54
No worries.

vlowther
2017-12-15 16:55
The actual name that you would fetch the template by over TFTP or HTTP is the Path field

marc.heckmann
2017-12-15 16:55
So I managed to successfully create a new bootenv. Time test it !

marc.heckmann
2017-12-15 16:55
ok, but which one is the one pointed to by the "filename" DHCP option?

marc.heckmann
2017-12-15 16:56
and after that, where do ixpe + elilo fit?

vlowther
2017-12-15 16:57
http://provision.readthedocs.io/en/latest/doc/arch/data.html#rs-data-architecture is the best guide for what the various things you can expand in a template are.

vlowther
2017-12-15 16:57
That is the fun part.

vlowther
2017-12-15 16:59
well, the "filename" option by default points at lpxelinux.0

marc.heckmann
2017-12-15 16:59
right, so if I recall from that document, lxpelinux.0 or EFI bootx64 is first sent

vlowther
2017-12-15 16:59
and pxelinux does the usual tftp waterfall to find a config file

marc.heckmann
2017-12-15 16:59
And then how do we get to ixpe?

vlowther
2017-12-15 17:00
and it finds the config file we wrote for it at pxelinux.cfg/{{.Machine.HexAddress}}

vlowther
2017-12-15 17:01
In the default config, we don't use ipxe

marc.heckmann
2017-12-15 17:01
ok, so I'll have figure out how to start using it

marc.heckmann
2017-12-15 17:02
(It's not strictly required, but just sort of as an experiment)

vlowther
2017-12-15 17:02
I ahve encountered systems where ipxe failed due to nic formware bugs/spanning tree issues/other networking issues

marc.heckmann
2017-12-15 17:02
ok.

marc.heckmann
2017-12-15 17:03
Next silly question: How to actually associate my new bootenv to an existing machines that's gone through the discovery process?

marc.heckmann
2017-12-15 17:03
At first glance, it seems that I need to create a "stage" associated w/ my bootenv?

shane
2017-12-15 17:04
that would be the best solution - then you create Workflow to move a machine through the stages to final install point

vlowther
2017-12-15 17:04
You can do that

marc.heckmann
2017-12-15 17:05
So my ideal workflow would be something like: UnknownMachine->Discover->AddtoInventory

marc.heckmann
2017-12-15 17:05
Then another would be TakeMachinefromInventory->DeployOS

marc.heckmann
2017-12-15 17:06
Maybe with a burnin step after discovery

marc.heckmann
2017-12-15 17:06
can we have multiple workflows or should that all be in one workflow?

vlowther
2017-12-15 17:07
You must change stage whenever you want to change boot environment

vlowther
2017-12-15 17:08
Otherwise, the worflow systems is pretty flexible.

vlowther
2017-12-15 17:11
@vlowther pinned a message to this channel.

vlowther
2017-12-15 17:11
@vlowther pinned a message to this channel.

vlowther
2017-12-15 17:13
I went ahead and pinned a couple of older messages with examples for DHCP option 67.

vlowther
2017-12-15 17:14
you would need to add/change option 67 in your subnets and/or reservations to tweak what the DHCP server will serve.

marc.heckmann
2017-12-15 17:15
ok, so I can do it per host. cool.

vlowther
2017-12-15 17:15
Yep

vlowther
2017-12-15 17:16
Options for reservations and subnets stack based on the relavent IP address

vlowther
2017-12-15 17:16
You can even create reservations that are outside a subnet

vlowther
2017-12-15 17:17
but you have to provide all the options in that case. :slightly_smiling_face:

marc.heckmann
2017-12-15 17:21
ok. So if I understand correctly the profile associated to a machine is the entry point into a given workflow?

marc.heckmann
2017-12-15 17:22
A new unknown machine will always have no profile in which case, the generic "global" profile applies?

vlowther
2017-12-15 17:22
The stage.

marc.heckmann
2017-12-15 17:22
hmm. lost you there.

vlowther
2017-12-15 17:22
Profiles are bags that hold parameters

marc.heckmann
2017-12-15 17:22
ok.

vlowther
2017-12-15 17:22
and machines can have any number of them.

marc.heckmann
2017-12-15 17:23
But looking at the Workflow page in the UX, it seems to be associated to a profile.

vlowther
2017-12-15 17:23
they act as a mechanism to provide parameters that shou78ld be shared across some subset of machines.

marc.heckmann
2017-12-15 17:23
ok. So we have a one -> many relation between machines and profiles

shane
2017-12-15 17:24
yes - and the Workflow system uses the `change-stage/map` Param type to manage the workflow stages

shane
2017-12-15 17:25
this Param can be contained in a Profile - which is attached to a Machine ...

shane
2017-12-15 17:25
for example - if you place it in "Global" profile, it applies to ALL machines provisioned - without being explicitly referenced

shane
2017-12-15 17:25
but - if you place it in a "my-workflow" profile - you must attach that Profile to a Machine

shane
2017-12-15 17:26
you can technically attach the Param to the machine directly - not via a Profile, as well ...

marc.heckmann
2017-12-15 17:26
ok. So a machine could have several workflows associated to it?

shane
2017-12-15 17:26
yep

shane
2017-12-15 17:26
you can have "fan-in" and "fan-out" with your workflows

marc.heckmann
2017-12-15 17:26
And how would the ordering of workflows be handled?

shane
2017-12-15 17:26
you can have several workflow steps

shane
2017-12-15 17:27
and you might choose to "orchestrate" those workflows externally - for example -we have a Terraform provider which does exactly that

vlowther
2017-12-15 17:27
fanout is not really solved with the current system. :confused:

shane
2017-12-15 17:27
(on the road map ... :slightly_smiling_face: )

marc.heckmann
2017-12-15 17:27
ok, I don't get what you mean by fan-in vs fan-out

shane
2017-12-15 17:28
the example Terraform provider does (basically): workflow 1: discover --> terraform-ready workflow 2: centos-7-install --> ssh-keys --> complete terraform initiates the change from "workflow 1" to "workflow 2"

shane
2017-12-15 17:29
fan-in means you can have multiple "stages" that a Machine might "find itself" in - and if it matches one of those stages, then advance in the workflow

vlowther
2017-12-15 17:30
Ignore fan-in and fan-out for now.

vlowther
2017-12-15 17:30
:slightly_smiling_face:

marc.heckmann
2017-12-15 17:30
ok. that's always nice :slightly_smiling_face:

marc.heckmann
2017-12-15 17:31
So for the workflow change from 1 to 2 that Terraform can do, technically speaking, how is that achieved: API call to change profile on the machine object?

vlowther
2017-12-15 17:31
API call to change stage.

vlowther
2017-12-15 17:32
discover, terraform-ready, etc. are all stages

marc.heckmann
2017-12-15 17:32
ok. I guess I lost you there. How does a stage change initiate another Workflow?

vlowther
2017-12-15 17:33
We basically chain them together.

vlowther
2017-12-15 17:34
the change-stage/map parameter consists of a map of currentStage: nextStage:action

vlowther
2017-12-15 17:35
so if we run all the tasks in the current stage successfully, we will update the machine to use the next stage and reboot if the boot environment says so.

marc.heckmann
2017-12-15 17:35
At this point, I probably need to get my hands a little dirtier with it to figure things out. Thanks for the help ! I'm sure you'll hear from us again!

vlowther
2017-12-15 17:35
(taht is, if the new stage wants a boot environment other than the one we are in)

marc.heckmann
2017-12-15 17:37
ok. one last thing: I understand that provision doesn't handle firmware updates, but we could create a stage that does that right?

vlowther
2017-12-15 17:39
Sure. I ahve been working on one for Dell gear that uses the DSU tool and its associated repos.

2017-12-15 17:39
Time to feed the :bear:!

vlowther
2017-12-15 17:39
and a more generic and finicky one for other systems.

marc.heckmann
2017-12-15 17:39
Obviously OOBM server control is something that we might build a higher level wrapper for (we actually want to tie all this in with Digital Ocean's Netbox: https://github.com/digitalocean/netbox)

vlowther
2017-12-15 17:40
and I eagerly awaiy the day this stuff comes to a semblance of standardization.

marc.heckmann
2017-12-15 17:40
I imagine the full fledged DigitalRebar product is already using "provision" ?

vlowther
2017-12-15 17:43
provision is a streamlined and simplified version or the older DigitalRebar product.

lae
2017-12-15 17:43
netbox woo

marc.heckmann
2017-12-15 17:44
right but the older product supports things like configuring IPMI + firmwares, but it does not yet leverage provision internally? Or is that something that you have yet to build around provision?

lae
2017-12-15 17:44
if you're using ansible might I suggest using my ansible role for maintaining netbox? :wink:

vlowther
2017-12-15 17:45
Don't let the device42 guys hear you say that. :slightly_smiling_face: We have a few of them hanging around here.

marc.heckmann
2017-12-15 17:45
@lae I think we already have something for Netbox, we're already pretty deeply invested in it.

marc.heckmann
2017-12-15 17:45
What is device42?

lae
2017-12-15 17:45
another dcim solution

marc.heckmann
2017-12-15 17:46
ok :slightly_smiling_face: I happen to love the elegance and simplicity of Netbox, but other solutions have their advantages too.

lae
2017-12-15 17:46
and yeah, same here, hence why I have a pretty battle-tested netbox ansible role (that has actually caught a bug in netbox before rolling out to prod hah)

marc.heckmann
2017-12-15 17:47
My DC team has one big grippe w/ netbox which is the lack of proper cable plant support (the famous issue #20 https://github.com/digitalocean/netbox/issues/20) but that will come eventually

greg
2017-12-15 18:06
@marc.heckmann DRP has content packages and plugins that allow for IPMI configuration and driving today. We are porting over the Bios/Raid/Component update pieces.

greg
2017-12-15 18:07
For example, there are plugins that allow for IPMI operations for bare metal, http://packet.net, and virtualbox.

marc.heckmann
2017-12-15 18:07
@greg ok, I'll check out those packages.

marc.heckmann
2017-12-15 18:07
Nothing for DNS yet?

greg
2017-12-15 18:08
well - we did in DRv2, but could do it. We actually have some requests to port that over. I?m mixed. Users have lots of different plans for that.

greg
2017-12-15 18:08
I?m going slowly.

greg
2017-12-15 18:08
I could envision a plugin that updates DNS based upon provisioning events. Do able, but not sure that is an immediate priority.

greg
2017-12-15 18:09
I can see workflow tasks that update DNS from the machines as well. lots of paths.

marc.heckmann
2017-12-15 18:09
ok. I guess it wouldn't be too difficult for us to add a stage for that. I'm thinking of integration into some sort of an API driven DNS.

greg
2017-12-15 18:11
Yeah - that is the point. We have a tool rebar-dns that is a go program in DRv2 that could be pulled over that provided a RESTFUL endpoint for updating and controlling DNS.

greg
2017-12-15 18:11
It could manage BIND locally, send messages to PowerDNS, or call nsupdate for ?open? updates.

florent.wagener
2017-12-15 20:23
i'm trying to do a basic centos-7 install using DRP, unfortunately I am hitting an issue here. During the installation, this error is poping: `failed to fetch kickstart from http://192.168.1.253:8091/machines/<UUID>/compute.ks`

florent.wagener
2017-12-15 20:24
I am surely missing something but can't find what.

vlowther
2017-12-15 20:24
well...

vlowther
2017-12-15 20:25
localhost is definitly not a good sign

vlowther
2017-12-15 20:25
that should be an address on DRP

florent.wagener
2017-12-15 20:26
oh it's not localhost, the url is http://192.168.1.253:8091/

florent.wagener
2017-12-15 20:27
and after that error, the server enters the emergency mode

vlowther
2017-12-15 20:27
ok

vlowther
2017-12-15 20:28
The server you are trying to provision?

florent.wagener
2017-12-15 20:28
yeah

vlowther
2017-12-15 20:28
ok

florent.wagener
2017-12-15 20:28
my DRP server is 192.168.1.253

florent.wagener
2017-12-15 20:29
the install is fresh and the discovery is working like a charm

vlowther
2017-12-15 20:29
does curl -fgL http://192.168.1.253:8091/machines/<UUID>/compute.ks from the machine dr-provision is running on give you anything?

florent.wagener
2017-12-15 20:30
It is displaying the kickstart yes.

vlowther
2017-12-15 20:31
ok

vlowther
2017-12-15 20:31
Is the server you are trying to provision on the same subnet as the server running dr-provision?

vlowther
2017-12-15 20:33
or are they on different subnets

florent.wagener
2017-12-15 20:35
no they're not

florent.wagener
2017-12-15 20:35
but it works in the dracut shell

vlowther
2017-12-15 20:35
is just looking for what the network layout looks like

florent.wagener
2017-12-15 20:36
through the default gateway

vlowther
2017-12-15 20:36
hm

vlowther
2017-12-15 20:36
so in the dracut shell you could curl/wget the kickstart?

florent.wagener
2017-12-15 20:36
yeah

vlowther
2017-12-15 20:36
ok

vlowther
2017-12-15 20:38
Have you tried bouncing the server to see if the the problem comes back?

florent.wagener
2017-12-15 20:38
yeah

florent.wagener
2017-12-15 20:39
got the error everytime I try :slightly_smiling_face:

vlowther
2017-12-15 20:39
That is interesting. :confused:

florent.wagener
2017-12-15 20:40
I'm looking with @marc.heckmann, we think we know what it is (possibly vagrant management network)

florent.wagener
2017-12-15 20:40
we testing in a virtual environment

vlowther
2017-12-15 20:40
I wou8ld be curious to see what (if anything) is in the logs from anaconda

vlowther
2017-12-15 20:40
hm hm.

florent.wagener
2017-12-15 20:41
I think we need to tell kickstart what NIC to use.

florent.wagener
2017-12-15 20:43
it looks like ksdevice=bootif is not working.

vlowther
2017-12-15 20:43
What does networking look like in your Vagrant env?

vlowther
2017-12-15 20:43
ok, that is really weird.

vlowther
2017-12-15 20:44
Competing DHCP servers, perhaps?

florent.wagener
2017-12-15 20:44
it's point to point tunnel through a virtual switch: https://github.com/CumulusNetworks/topology_converter

florent.wagener
2017-12-15 20:44
we're looking at competing DHCP servers.

vlowther
2017-12-15 20:44
Are you using the DHCP server built in to DRP?

florent.wagener
2017-12-15 20:44
yes

vlowther
2017-12-15 20:45
and another one at the same time, it looks like. :confused:

vlowther
2017-12-15 20:46
If you turn on DHCP logs via the UX (just set it to a value other than Off)

vlowther
2017-12-15 20:46
then the logs for DRP will probably show what DHCP server is conflicting.

vlowther
2017-12-15 20:49
Or even if the logs are Off.

vlowther
2017-12-15 20:50
@greg We need a wayt to surface subsystem logging in the UX.

greg
2017-12-15 20:50
@vlowther Agreed.

florent.wagener
2017-12-15 20:52
It really looks like other interface (Vagrant mgmt network) was up, at least temporarily, via DHCP despite the fact that we told KS to explicitely use another interface as BOOTIF

florent.wagener
2017-12-15 20:53
Here's the weird thing: `BOOTIF=01-a0-00-00-00-00-52` . The `01` prefixed at the beginning is not part of the MAC. Where does that come from?

florent.wagener
2017-12-15 20:54
Don't know if that's normal or not

florent.wagener
2017-12-15 20:54
Going to dig a little deeper

vlowther
2017-12-15 20:54
01 indicates that it is an Ehternet device, IIRC

vlowther
2017-12-15 20:55
It is totally normal.

greg
2017-12-15 20:55
yes to ethernet device. It is normal.

greg
2017-12-15 20:56
Vagrant is tricky, I think. I use virtualbox without vagrant to avoid the interface problems.

vlowther
2017-12-15 20:57
and I use kvm via a couple of wrapper scripts.

florent.wagener
2017-12-15 20:58
We don't have a choice unfortunately. We'll figure this out. The weird thing is that the dracut environment has the vragant interface up temporarily and then by the time everything times out we drop to shell, we only have the correct interface up.

vlowther
2017-12-15 20:58
Historically, Vagrant has tried to be helpful in ways that are the opposite for testing our provisioning code.

florent.wagener
2017-12-15 20:58
I guess it's normal since we've not even at kickstart yet.

florent.wagener
2017-12-15 21:17
Actually, looking at the Dracut logs, we see that Dracut does the right thing and Vagrant NIC is never brought up

florent.wagener
2017-12-15 21:18
the problem is something else

florent.wagener
2017-12-15 21:19
It looks like the DHCP lease refresh time is really short: 30sec

greg
2017-12-15 21:23
Yes - probably. You can change that in your subnet definition.

greg
2017-12-15 21:23
The active lease time is meant to be short by default, but you can change it.

florent.wagener
2017-12-15 21:27
Lease renewal isn't the issue there. Just tried it.

florent.wagener
2017-12-15 21:27
Found the problem: No space left on device to save the KS file :slightly_smiling_face:

vlowther
2017-12-15 21:28
ah, well, then.

vlowther
2017-12-15 21:29
Not enough memory?

florent.wagener
2017-12-15 21:29
looks like it couldn' t mount a loop device

florent.wagener
2017-12-15 21:29
1GiB of RAM. Enough?

vlowther
2017-12-15 21:30
I don't actually know what the minimum for a CentOS install is these days.

florent.wagener
2017-12-15 21:30
hmmm

shane
2017-12-15 21:31
Try 1.5 GiB - I think I ran in to that before

shane
2017-12-15 21:31
And 1.5 worked

vlowther
2017-12-15 21:31
I give my KVM slaves 4 GB each

florent.wagener
2017-12-15 21:33
It's actually complaining about bad superblock, etc.. when trying to mount the loop device

vlowther
2017-12-15 21:34
During the OS install, or during Sledgehammer?

florent.wagener
2017-12-15 21:35
OS install in dracut

vlowther
2017-12-15 21:39
That sounds like file corruption somewhere along the way.

florent.wagener
2017-12-15 21:40
seems to work w/ 2GiB

florent.wagener
2017-12-15 21:41
Thanks for the help guys :)

greg
2017-12-15 21:50
:slightly_smiling_face: YEAH!

jtyo
2017-12-15 22:25
has joined #json

dsternesky
2017-12-15 22:45
has joined #json

shane
2017-12-15 23:06
@jtyo and @dsternesky welcome

jtyo
2017-12-15 23:14
@shane Thank you!

mroth
2017-12-15 23:35
has joined #json

wdennis
2017-12-16 02:42
@lae on Galaxy?

lae
2017-12-16 06:42

lae
2017-12-16 06:43
(stats were reset on accident yesterday so don't mind it :sweat_smile:)

shane
2017-12-16 15:12
@mroth welcome

2017-12-17 22:51
hi

2017-12-17 22:52
How rebar manage dhcp or dns ? does it embedded this services ? or use existing ones like isc dhcpd or bind ?

shane
2017-12-17 23:55
hi @chronidev - Digital Rebar Provision (DRP) does contain an embedded DHCP service (DHCP, TFTP, PXE, and HTTP, and HTTPS for API). you can also use any external DHCP server (turn off the embedded DHCP server), or you can use Proxy DHCP service within DRP

shane
2017-12-17 23:56
however, we do not have any embedded DNS services ... though through the Workflow management system, it's fairly trivial to write a Stage that would be able to add dynamic DNS update capabilities to external DNS services

shane
2017-12-17 23:57
we are looking at reviving some of our older DNS integration code for our newer DRP service, but at the moment, it is not integrated in todays product

shane
2017-12-17 23:57
you can find some of the documentation on DHCP in our docs at: http://provision.readthedocs.io/en/latest/doc/configuring.html

zehicle
2017-12-18 01:43
also, @chronidev, a key difference w/ DRP DHCP is that it is API driven. While, you can pre-configure using content packages or pre-positioning config files, the primary design pattern is to use the API to make atomic updates. You can also subscribe to DHCP events via websockets (really, any DRP activity).

zehicle
2017-12-18 01:46
I just heard a shout out to Netbox on Packetpushers Podcast (#112) - the one about data center environments.

vlowther
2017-12-18 21:33
ok, question time.

solidgrid
2017-12-18 21:33
has joined #json

vlowther
2017-12-18 21:34
I am comtemplating turning repostitory management stuff currently managed via the provisioner-repos param into a new top-level object in the API

vlowther
2017-12-18 21:34
tenatively called Repos.

vlowther
2017-12-18 21:36
How many of y'all are using the package-repositories functionality?

vlowther
2017-12-18 21:36
I want to get an idea of what the impact of migrating things will be.

wdennis
2017-12-18 23:14
@vlowther what even is the ?package-repositories? functionality? I just consume the ?official? upstream repos...

shane
2017-12-18 23:19
@lae are you using the provisioner-repo functions ?

shane
2017-12-18 23:19
@wdennis that's one vote for "no, I don't use it", then :slightly_smiling_face:

lae
2017-12-18 23:53
@shane I have yet to make the switchover to it

shane
2017-12-18 23:54
ok - good - @vlowther is looking to make some changes to integrate the repo handling a bit better, but it'll break current usage

lae
2017-12-18 23:54
yeah, that's fine with me

lae
2017-12-18 23:56
btw @wdennis when you're limited by a 500mbit pipe to the internet being shared across an entire DC (meant to be consumed internally mostly, hence the limitation), provisioning 10 machines using an internet facing repo is really slow

2017-12-19 01:17
This message was deleted.

wdennis
2017-12-19 01:17
You have to mirror (or cache) then

shane
2017-12-19 01:18
which is what the `provisioner-repos` param is for :slightly_smiling_face:

wdennis
2017-12-19 01:18
@lae yup, understand

wdennis
2017-12-19 01:19
Jeez, I fail at Slack

shane
2017-12-19 01:20
:slightly_smiling_face:

wdennis
2017-12-19 01:20
So you can delete a msg with a threaded reply then...

shane
2017-12-19 01:20
threaded replies are stupid (IMO)

wdennis
2017-12-19 01:20
It was accidental even

shane
2017-12-19 01:21
I guess in very active slack channels, it allows for "side conversations" ... but they did a horrible job of the UI - it's nearly impossible to follow threads

wdennis
2017-12-19 01:22
Yeah, I guess they wanted to keep sidebars out of the main flow, but agree to the confusing UI implementation

wdennis
2017-12-19 01:23
You folks still looking at image-based OS installs?

shane
2017-12-19 01:25
oh yes

shane
2017-12-19 01:26
we have a customer implementation that uses "curtin" as the piece to lay down the images - but it's not a very good implementation - we're still looking at other tools to do a better job of the bare metal prep for the image - I believe one strong contender is "ignition" from CoreOS - but that piece isn't baked yet

shane
2017-12-19 01:27
I doubt we'll release the "curtain" based solution simply because it's too limited and highly customized for the one customer implementation

wdennis
2017-12-19 01:28
Hopefully that?ll make the ?open core? cut

shane
2017-12-19 01:29
I don't have any info for you on that front - sorry ...

wdennis
2017-12-19 01:30
N/p - just would be cool

shane
2017-12-19 17:17
- our v007 community online meetup starts in just under 2 hours (11am PST). Check out the agenda at: https://docs.google.com/document/d/1BoWbo114IOT4HInnlfZB6KmDgqyxtQX2_qkzpbxEp1o


zehicle
2017-12-19 19:01
Meeting is starting now: https://zoom.us/j/3403934274


shane
2017-12-21 18:05
- the v007 meetup recording is available at: https://youtu.be/ZEkzMKhe0f4 Great discussions on: - KubeCon/CloudNativeCon recap and insights - updates to the DRP Terraform Provider capabilities (what can't this thing do!!?) - Package Repository architectural changes - Immutable Kubernetes (KRIB) extensions and features discussion - L8ist Sh9y audio podcast on Patch APIs, Swagger, and Integrations

shane
2017-12-21 18:06
...AND... greg is about to announce:

greg
2017-12-21 18:06
- Also, v3.5.0 of DRP is published to stable. v1.4.0 of the the content packages and plugins are out as well.

greg
2017-12-21 18:06
:slightly_smiling_face:


wdennis
2017-12-21 22:21
Up on v3.5 (w00t!) but have an issue with adding KRIB (which I want to try to play with over the upcoming holiday)

wdennis
2017-12-21 22:22
I looked in the catalog and saw this, and "+"d it into my DRP system

wdennis
2017-12-21 22:22

wdennis
2017-12-21 22:23
But now when I look in my Content screen, I see this:


wdennis
2017-12-21 22:24
So do I have version 1.4.0, or "tip"?

greg
2017-12-21 22:24
Click the drop down arrow and see what it says.

greg
2017-12-21 22:24
You may have to select 1.4.0

greg
2017-12-21 22:24
and update.

wdennis
2017-12-21 22:25
I have choices "tip" or "v1.1.0"

greg
2017-12-21 22:25
hmm

wdennis
2017-12-21 22:26
I did not "Transfer" yet

greg
2017-12-21 22:27
For mine, I had to scroll down to get the rest of the options.

greg
2017-12-21 22:27
I had to OPne the arrow, then mouse wheel down to get to 1.4.0

wdennis
2017-12-21 22:28
did not notice the scroll capability in the drop - I did scroll, select "v1.4.0", and transfer

greg
2017-12-21 22:28
that should be good

wdennis
2017-12-21 22:28
Why does the UX not show all values by default?

greg
2017-12-21 22:29
On mine it does, it just could be off the screen. It doesn?t auto scroll the list into view.

wdennis
2017-12-21 22:29
OK

wdennis
2017-12-21 22:29
Anyways, all good now.

wdennis
2017-12-21 22:30
How may nodes minimum for a decent KRIB install - 3, 4?

greg
2017-12-21 22:30
at least 2 to play with.

greg
2017-12-21 22:30
You will get N-1 minions and 1 master.

wdennis
2017-12-21 22:30
OK, have three, could rack & set up a 4th if needed.



florent.wagener
2017-12-22 01:15
hey guys, thanks again for the help the other day. Since the centos provisioning is working well, now I wanna try to provision Windows. I didnt find a lot of help in the documentation, do you have any tip for me?

shane
2017-12-22 02:08
@florent.wagener - we can do winders provisioning, but we need to talk a bit first - tmw what's a good time for you ?

florent.wagener
2017-12-22 02:10
@shane From 10Am to 3-5Pm EST :)

shane
2017-12-22 02:11
ok - ping us when free around those times

florent.wagener
2017-12-22 02:44
Alright, thanks, will do

wdennis
2017-12-22 02:51
- trying to follow @zehicle ?s KRIB install from the webinar; trying for setting up k8s on installed nodes

wdennis
2017-12-22 02:53
In the webinar vid, he shows a stage map that has a task named ?runner-service? ? I don?t have that on my system (v3.5.0)

wdennis
2017-12-22 02:53
Where to get that?

greg
2017-12-22 03:06
task-library has it.

greg
2017-12-22 03:07
@wdennis

wdennis
2017-12-22 03:09
Thx @greg

wdennis
2017-12-22 03:10
What makes stuff appear in the Content section by default vs having to get it from the Catalog?

greg
2017-12-22 03:12
settings and some other things like bugs

wdennis
2017-12-22 03:15
lol

wdennis
2017-12-22 03:20
OK. So i have already installed nodes (U16.04) that are sitting at stage ?complete-nowait? with bootenv ?local?

wdennis
2017-12-22 03:21
I don?t have to reinstall them to get into the KRIB install do I? Can I just start with Docker install? How to structure the stage map?

wdennis
2017-12-22 03:25
Does this look like a workable stage map? (wondering about the beginning task)


wdennis
2017-12-22 03:54
N/m, realized with no runner process running on the node, it would never take instructions to do anything...

wdennis
2017-12-22 03:55
Starting with OS install & going from there

wdennis
2017-12-22 14:18
Try #2 stage-map:


wdennis
2017-12-22 14:26
Does the `krib/cluster-profile` param value have to be set to the name of the profile? Here?s what I have:

wdennis
2017-12-22 14:26

greg
2017-12-22 14:31
yes - it needs to be

ctrees
2017-12-22 14:33
@wdennis thanks for the blow by blow.... I want to do the same thing over vacation and this has already helped my 'motivation'

wdennis
2017-12-22 14:34
@ctrees n/p, please follow along as I bloody myself :wink:

wdennis
2017-12-22 14:34
@greg something?s not going correctly here?

wdennis
2017-12-22 14:36
Machines all seem to be stuck in these states:

wdennis
2017-12-22 14:36

wdennis
2017-12-22 14:38
What exactly is the `finish-install` task for?

greg
2017-12-22 14:39
Your workflow is not what Rob was using.

wdennis
2017-12-22 14:41
Hmmm? Other than starting with U16.04 rather than C7, I think it is? Duped it off the webinar vid

greg
2017-12-22 14:41
Can you open change-stage/map and show me?

greg
2017-12-22 14:42
also make sure the `k8s-cluster1` is the only profile on the node with a `change-stage/map`

greg
2017-12-22 14:42
You should have two workflows

greg
2017-12-22 14:42
Discover workflow: discover->sledgehammer:Success

greg
2017-12-22 14:47
Krib Workflow for Install: ubuntu-16.04-install->runner-service:Success runner-service->finish-install:STOP <- This is STOP finish-install->docker-install:Success docker-install->krib-install:Success krib-install->complete:Success

greg
2017-12-22 14:47
to add centos7 -> centos-7-install->runner-service:Success

wdennis
2017-12-22 14:47
``` [dradmin@dr-admin drp]$ drpcli profiles show k8s-cluster1 { "Available": true, "Description": "", "Errors": [], "Meta": { "color": "", "icon": "", "title": "" }, "Name": "k8s-cluster1", "Params": { "access-keys": { "root": "ssh-rsa .... will@Wills-MacBook-Air" }, "access-ssh-root-mode": "yes", "change-stage/map": { "docker-install": "krib-install:Success", "finish-install": "docker-install:Success", "krib-install": "complete:Success", "runner-service": "finish-install:Success", "ssh-access": "runner-service:Success", "ubuntu-16.04-install": "ssh-access:Success" }, "dns-domain": "http://nec-labs.com", "krib/cluster-profile": "k8s-cluster1", "local-repo": false, "ntp_servers": [ "138.15.xxx.97", "138.15.yyy.4" ], "operating-system-disk": "sda", "provisioner-default-fullname": "IST Group", "provisioner-default-password-hash": "...", "provisioner-default-user": "istgroup" }, "ReadOnly": false, "Validated": true } ```

greg
2017-12-22 14:48
finish-install:Stop in the runner-service piece

wdennis
2017-12-22 14:48
Yes, just saw that

wdennis
2017-12-22 14:49
So, why?

greg
2017-12-22 14:49
Because it is how you Stop the runner to let the install complete.

wdennis
2017-12-22 14:49
Ah, if not, race condition?

greg
2017-12-22 14:49
docker-install and krib-install can NOT be run in the install environment because it must start services in systemd that are not allowed.

wdennis
2017-12-22 14:51
Any way to edit that step in the Profile via drpcli so as to just change `"runner-service": "finish-install:Success"` to `"runner-service": "finish-install:Stop"` ?

greg
2017-12-22 14:53
yes - that would match what I?m saying.

wdennis
2017-12-22 14:56
OK, just re-set the stage-map thru the UX; now I have it as: ``` "change-stage/map": { "docker-install": "krib-install:Success", "finish-install": "docker-install:Success", "krib-install": "complete:Success", "runner-service": "finish-install:Stop", "ssh-access": "runner-service:Success", "ubuntu-16.04-install": "ssh-access:Success" }, ```

wdennis
2017-12-22 14:58
Look OK?

greg
2017-12-22 14:59
For the krib workflow.

greg
2017-12-22 15:00
You should also add the discover transitions as well. So that you can recycle the nodes if you need to.

wdennis
2017-12-22 15:00
Oh, OK ? skipped that since I had already discovered the nodes previously

wdennis
2017-12-22 15:08
Also ? from my prior try, I have another k8s profile I bulk added to the machines, that I cannot seem to get rid of now (at least thru the UX) - it?s named `k8s-cluster-installer` ``` [dradmin@dr-admin drp]$ drpcli machines list | jq .[].Profiles [ "k8s-cluster-installer", "k8s-cluster1" ] [ "k8s-cluster-installer", "k8s-cluster1" ] [ "k8s-cluster-installer", "k8s-cluster1" ] [ "k8s-cluster-installer", "k8s-cluster1" ] ```

greg
2017-12-22 15:08
YOu need to remove those or remove their parameters.

wdennis
2017-12-22 15:09
It does not have a stage-map or params? ``` [dradmin@dr-admin drp]$ drpcli profiles show k8s-cluster-installer { "Available": true, "Description": "", "Errors": [], "Meta": { "color": "black", "icon": "hashtag", "title": "User added profile" }, "Name": "k8s-cluster-installer", "Params": {}, "ReadOnly": false, "Validated": true } ```

greg
2017-12-22 15:09
okay - that is fine. It shouldn?t interfere

wdennis
2017-12-22 15:09
But for some reason, the UX won?t let me remove it from the machines?

greg
2017-12-22 15:09
in bulk? or machine edit?

wdennis
2017-12-22 15:10
Either

greg
2017-12-22 15:10
Hard refresh the bulk action page.

greg
2017-12-22 15:11
The remove profiles from machine. See if UX was out of date.

wdennis
2017-12-22 15:11
Getting this error:

wdennis
2017-12-22 15:12

wdennis
2017-12-22 15:12
When I try to delete the profile from a machine

greg
2017-12-22 15:13
show me a machine please.

wdennis
2017-12-22 15:15
What?s the `jq` syntax to omit the gohai inventory?

greg
2017-12-22 15:15
or do `jq .[].Profiles`

wdennis
2017-12-22 15:16
Representative machine: ``` [dradmin@dr-admin drp]$ drpcli machines show "1bcd8472-6c20-47b3-b9ff-f32731905bf1" | jq .Profiles [ "k8s-cluster-installer", "k8s-cluster1" ] ```

greg
2017-12-22 15:16
```drpcli machines removeprofile 1bcd8472-6c20-47b3-b9ff-f32731905bf1 k8s-cluster-installer```

wdennis
2017-12-22 15:18
OK, that worked

wdennis
2017-12-22 15:18
Now I have to fix the other three

greg
2017-12-22 15:18
It is a UX bug I think

wdennis
2017-12-22 15:21
OK, down to the one Profile/machine. Let?s try again.

wdennis
2017-12-22 15:22
And we?re off

ctrees
2017-12-22 15:36
when you get greg chat'n I get lots of insights into the mental 'D&D' map of DRP (wizards playbook) with lighted sign posts (your UI images)... super insightful 4me... wishing for a 'christmas krib miracle'

greg
2017-12-22 15:40
lol

wdennis
2017-12-22 15:54
Looks promising? ``` [discovery] Successfully established connection with API Server "192.168.1.114:6443" This node has joined the cluster: * Certificate signing request was sent to master and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the master to see this node join the cluster. Finished successfully Command exited with status 0 Action krib-install.sh.tmpl finished Task krib-install finished Updated job 12fb4cb3-c96d-47ab-b7fe-0d07071fd818 to finished ```

greg
2017-12-22 15:55
looks good

wdennis
2017-12-22 15:57
And on the master (192.168.1.114): ``` Determinig master - see if it is already decided or configured "1bcd8472-6c20-47b3-b9ff-f32731905bf1" Master is 1bcd8472-6c20-47b3-b9ff-f32731905bf1 I am master - run kubeadm MAKE SURE SWAP IS OFF!- kubeadm requirement [WARNING FileExisting-crictl]: crictl not found in system path Starting calico networking... configmap "calico-config" created daemonset "calico-etcd" created service "calico-etcd" created daemonset "calico-node" created deployment "calico-kube-controllers" created deployment "calico-policy-controller" created clusterrolebinding "calico-cni-plugin" created clusterrole "calico-cni-plugin" created serviceaccount "calico-cni-plugin" created clusterrolebinding "calico-kube-controllers" created clusterrole "calico-kube-controllers" created serviceaccount "calico-kube-controllers" created secret "kubernetes-dashboard-certs" created serviceaccount "kubernetes-dashboard" created role "kubernetes-dashboard-minimal" created rolebinding "kubernetes-dashboard-minimal" created deployment "kubernetes-dashboard" created service "kubernetes-dashboard" created clusterrolebinding "kubernetes-dashboard" created Wait for admin container to start ```

wdennis
2017-12-22 15:58
Where is it that I can see the credentials to connect to the cluster? Thought it was on the machine?s params?

wdennis
2017-12-22 16:00
Oh, the params are in the `k8s-cluster1` profile

wdennis
2017-12-22 16:10
YAAAAAAAAAAAS! ``` Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf get nodes NAME STATUS ROLES AGE VERSION testnode01 Ready <none> 28m v1.9.0 testnode02 Ready <none> 28m v1.9.0 testnode03 Ready master 29m v1.9.0 testnode04 Ready <none> 28m v1.9.0 ```

shane
2017-12-22 16:15
woot woot

wdennis
2017-12-22 16:22
Anyone know how to remove a k8s proxy?

wdennis
2017-12-22 16:22
I was running minikube on my Macbook, but even when stopping minikube, the proxy remains to it?

wdennis
2017-12-22 16:23
```Wills-MacBook-Air:Documents will$ kubectl get nodes NAME STATUS ROLES AGE VERSION minikube Ready <none> 14d v1.8.0 Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf get nodes NAME STATUS ROLES AGE VERSION testnode01 Ready <none> 38m v1.9.0 testnode02 Ready <none> 38m v1.9.0 testnode03 Ready master 39m v1.9.0 testnode04 Ready <none> 38m v1.9.0 Wills-MacBook-Air:Documents will$ minikube stop Stopping local Kubernetes cluster... Machine stopped. Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf proxy F1222 11:21:45.491092 5271 proxy.go:153] listen tcp 127.0.0.1:8001: bind: address already in use```

zehicle
2017-12-22 17:10
netstat for the port, kill the process?

wdennis
2017-12-22 17:18
@zehicle I was hoping for something like `kubectl proxy delete` or the like :slightly_smiling_face:

wdennis
2017-12-22 17:19
Interestingly enough, couldn't see it in `netstat`...

wdennis
2017-12-22 17:20
But there was a `kubectl proxy` process running, which I `-TERM`'d

wdennis
2017-12-22 17:36
So now I can hit the k8s API OK, but not the Dashboard - getting: ``` { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "no endpoints available for service \"https:kubernetes-dashboard:\"", "reason": "ServiceUnavailable", "code": 503 } ```

wdennis
2017-12-22 17:43
Duh - forgot that I had backgrounded it?

greg
2017-12-22 17:43
not sure - did it start? See if the pods are up. and what URL did you use to access it?

greg
2017-12-22 17:43
@wdennis

wdennis
2017-12-22 17:45
Does kubeadm start the kubernetes mgmt services? No pods are running?

wdennis
2017-12-22 17:45
```Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf get pods No resources found.```

wdennis
2017-12-22 17:46
Could be that I?m just a k8sN00b and I don?t know what I?m doing yet :confused:


wdennis
2017-12-22 17:52
Did the following per that page, got this: ``` Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml secret "kubernetes-dashboard-certs" unchanged serviceaccount "kubernetes-dashboard" unchanged role "kubernetes-dashboard-minimal" unchanged rolebinding "kubernetes-dashboard-minimal" unchanged deployment "kubernetes-dashboard" unchanged service "kubernetes-dashboard" unchanged Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf proxy Starting to serve on 127.0.0.1:8001 ```

wdennis
2017-12-22 17:54

wdennis
2017-12-22 18:00
This is the process tree on the k8s master: ``` root@testnode03:~# pstree systemd???accounts-daemon???{gdbus} ? ??{gmain} ??acpid ??agetty ??atd ??cron ??dbus-daemon ??dhclient ??dockerd???containerd???5*[containerd-shim???pause] ? ? ? ??9*[{containerd-shim}]] ? ? ??containerd-shim???kube-scheduler???31*[{kube-scheduler}] ? ? ? ??11*[{containerd-shim}] ? ? ??containerd-shim???kube-controller???32*[{kube-controller}] ? ? ? ??10*[{containerd-shim}] ? ? ??containerd-shim???kube-apiserver???24*[{kube-apiserver}] ? ? ? ??9*[{containerd-shim}] ? ? ??containerd-shim???etcd???34*[{etcd}] ? ? ? ??9*[{containerd-shim}] ? ? ??containerd-shim???pause ? ? ? ??11*[{containerd-shim}] ? ? ??containerd-shim???pause ? ? ? ??10*[{containerd-shim}] ? ? ??containerd-shim???kube-proxy???32*[{kube-proxy}] ? ? ? ??10*[{containerd-shim}] ? ? ??containerd-shim???sh???etcd???32*[{etcd}] ? ? ? ??9*[{containerd-shim}] ? ? ??containerd-shim???runsvdir???runsv???confd???28*[{confd}] ? ? ? ? ??runsv???bird ? ? ? ? ??runsv???bird6 ? ? ? ? ??runsv???calico-felix???36*[{calico-felix}] ? ? ? ??10*[{containerd-shim}] ? ? ??containerd-shim???install-cni.sh???sleep ? ? ? ??10*[{containerd-shim}] ? ? ??36*[{containerd}] ? ??52*[{dockerd}] ??drpcli???31*[{drpcli}] ??irqbalance ??2*[iscsid] ??kubelet???39*[{kubelet}] ??lvmetad ??lxcfs???2*[{lxcfs}] ??mdadm ??polkitd???{gdbus} ? ??{gmain} ??rsyslogd???{in:imklog} ? ??{in:imuxsock} ? ??{rs:main Q:Reg} ??snapd???8*[{snapd}] ??sshd???sshd???sshd???bash???sudo???bash???pstree ??systemd???(sd-pam) ??systemd-journal ??systemd-logind ??systemd-timesyn???{sd-resolve} ??systemd-udevd ```

greg
2017-12-22 18:31
Try with 127.0.0.1 instead of local host.

greg
2017-12-22 18:31
Kubectl ?all-namespaces get pods

wdennis
2017-12-22 18:45
No `--all-namespaces` flag (at least on my `kubectl` version)

wdennis
2017-12-22 18:45
But, `man kubectl` FTW?

wdennis
2017-12-22 18:45
```Wills-MacBook-Air:Documents will$ kubectl --kubeconfig=krib-cluster1-admin.conf -n=kube-system get pods NAME READY STATUS RESTARTS AGE calico-etcd-hwwn2 1/1 Running 0 3h calico-kube-controllers-d6c6b9b8-2lvz4 1/1 Running 0 3h calico-node-bpzp2 2/2 Running 0 3h calico-node-mr6hx 2/2 Running 0 3h calico-node-vnscf 2/2 Running 1 3h calico-node-x6g2b 2/2 Running 0 3h etcd-testnode03 1/1 Running 0 3h kube-apiserver-testnode03 1/1 Running 0 3h kube-controller-manager-testnode03 1/1 Running 0 3h kube-dns-6f4fd4bdf-5lzmh 2/3 CrashLoopBackOff 75 3h kube-proxy-67745 1/1 Running 0 3h kube-proxy-6nmh7 1/1 Running 0 3h kube-proxy-lpc79 1/1 Running 0 3h kube-proxy-ztkp2 1/1 Running 0 3h kube-scheduler-testnode03 1/1 Running 0 3h kubernetes-dashboard-7b7b5cd79b-pvj6w 0/1 CrashLoopBackOff 36 3h```

wdennis
2017-12-22 19:19
More on-topic (DRP-related), it doesn?t seem intuitive to me that the `finish-install` stage has an action of `Stop` in the stage-map - shouldn?t it be ?Wait? or something?

ctrees
2017-12-22 20:42
isn't the 'stop' more of 'stop the runner from getting more commands' so the action of 'stop' in the stage-map is the flag put into the runner que that it knows finish what it's doing, then 'get out of the way' ??? but I'm still sorting out the instructional layers and names in my head... my D&D wizard map has a lot of grey and black clouds....

greg
2017-12-22 22:35
@ctrees win the read the mind of Greg award

greg
2017-12-22 22:36
Success could be renamed to wait but that isn?t accurate either.

greg
2017-12-22 22:37
Success means do what the current stage says to do.

greg
2017-12-22 22:38
The stage has a runnerwait flag. Success says do that. Stop says really runner ignore the stage and stop

greg
2017-12-22 22:38
The third one is Reboot. That tells the runner to change stage and reboot.

greg
2017-12-22 22:39
This is the core of workflow. Though we need to address one last issue with it.

greg
2017-12-22 22:40
@wdennis you dashboard is crashing. You need to look at pod logs to see why.

greg
2017-12-22 22:41
Also dns is crashing. Logs may tell you there as well.

wdennis
2017-12-22 22:49
So the ?action? is at the ?front? of the stage? ?Success? is ?kick this off and if successful keep going?, ?Stop? is ?kick this off and stop (wait)?? I thought it was at the ?end?, like Success = ?if stage run was successful then change stage?, ?Stop? = ?once stage is complete (ends successfully) then stop.?

greg
2017-12-22 22:50
No

greg
2017-12-22 22:50
It is at the end

greg
2017-12-22 22:51
The change stage only happens if success. Otherwise a task failed, a job was marked failed with the job and the runner is paused because the machine is marked not runnable on failure

greg
2017-12-22 22:53
So a set tasks in the stage completes, the stage map is checked to see what the next stage and action is. If no atage is found, the current stage?s runnerwait flag is used to determine if the runner should keep running

greg
2017-12-22 22:55
If a next stage is found, the machine stage is changed to that. The action is done. Success means loop bacj around and run new tasks if there are any, reboot means reboot the machine. And stop means stop the runner even if there are new tasks to do

wdennis
2017-12-22 22:55
So if the action does not happen until the end (successful completion) of the stage, why do you have to tell the runner to stop and wait until the OS install stage is complete via the ?finish-install:Stop? stage?

greg
2017-12-22 22:56
Because it is in the single thread execution of the kickstart or preseed and is blocking the finishing of the install. Anything other than stop will hang the install or do a reboot loop

wdennis
2017-12-22 22:57
How does the runner ?know? when the OS install process is complete?

greg
2017-12-22 22:58
You could set up the stages so that you don?t have any tasks left and a stage with runnerwait to false. It would accomplish the escape of the install process. But not run anything ever again.

greg
2017-12-22 22:59
It doesn?t know. It just runs as part of the install process and when it exits. The install process completes.

greg
2017-12-22 23:02
The first paragraph is how normals installs work when then last stage is complete no wait. It sets the Bootenv to local, has no tasks, and the runnerwait flag set to false. It runner exits and the install finishes and reboots. The machine boots ty local disk and the os runs.

greg
2017-12-22 23:02
We want to do more so we do the longer stage sequence

greg
2017-12-22 23:03
In the krib example

wdennis
2017-12-22 23:13
@greg sorry I?m so thick, just wanting to understand?

wdennis
2017-12-22 23:13
So the stage-map is like this: ``` "change-stage/map": { "docker-install": "krib-install:Success", "finish-install": "docker-install:Success", "krib-install": "complete:Success", "runner-service": "finish-install:Stop", "ssh-access": "runner-service:Success", "ubuntu-16.04-install": "ssh-access:Success" }, ```

wdennis
2017-12-22 23:15
So a map that says `"ubuntu-16.04-install": "ssh-access:Success"` means: ???

greg
2017-12-22 23:50
?when in the ubuntu-16.04-install stage and runner has no more tasks, change stage to ssh-access and see if there are more tasks to run?

wdennis
2017-12-23 00:34
So ?Success? means ?see if there are more tasks to run??

wdennis
2017-12-23 00:34
Why did you name it ?Success??

wdennis
2017-12-23 00:41
And so let?s see if I have it: `"runner-service": "finish-install:Stop"` means ?when in the runner-service stage and runner has no more tasks, change stage to the finish-install stage and Stop and wait? (for what?)

wdennis
2017-12-23 01:31
@greg ^^^ can I get you to answer the last two q?s when you can?

greg
2017-12-23 01:34
Success means continue processing.

greg
2017-12-23 01:35
If nothing stops you

greg
2017-12-23 01:35
Why did you add ?and wait?? Stop is just stop

greg
2017-12-23 01:36
Success was named success because it is the success path. It didnt fail. It didn?t stop it didn?t reboot

wdennis
2017-12-23 01:37
I was assuming the ?and wait? - because it?s in the middle of a pipeline of stages. If it hard stopped, why would the pipeline continue?

wdennis
2017-12-23 01:41
I want to understand your logic, so that I can successfully construct my own stage maps instead of having to cargo-cult what you guys come up with and not understand why it fails? Sorry if my questions are annoying?

wdennis
2017-12-23 01:44
If `"finish-install:Stop"` stops the runner, what starts it up again that picks up and goes on to do `docker-install` etc.?

greg
2017-12-23 01:47
The runner-service installs two services. One that marks the machine runnnable on startup and another that starts a runner.

greg
2017-12-23 01:48
This causes the machine to run a drpcli when the OS boots. The runner runs all the tasks in finish-install (no tasks) and then goes to the change stage map to change stage to docker-install (with success) to continue running tasks.

greg
2017-12-23 01:49
There is no assumed waits.

greg
2017-12-23 01:57
If you look at the bootenvs, sledgehammer (startup script) and the os-install bootenvs (through the kickseed files) mark the machine runnable and start the runner. This means that booting into those environments starts a task runnable runner. THe finished OSes don?t do this unless you add the runner service stage previously in the process.

greg
2017-12-23 02:00
Waiting for more tasks when none are present is controlled by the RunnerWait flag on the stage. If no tasks are present after a stage change that doesn?t change stage (getting to the end of a workflow), the curent stage decides wait. Stop is a wait for the workflow to exit the runner after a stage change independent of the RunnerWait Flag Success just means continue doing what you are doing. Reboot is a way to say change stage and reboot before you run the next set of tasks.

wdennis
2017-12-23 02:15
This is very helpful. Thanks for the explanations.

wdennis
2017-12-23 02:16
I don?t know why I have a mental block on understanding the ?runner? concept and the way it works and is manipulated; I just don?t find it intuitive somehow?

wdennis
2017-12-23 02:19
What I think I know at this point is:

wdennis
2017-12-23 02:23
- Stages have a bool (RunnerWait) that indicates to the runner process that is started either by an OS install stage or runner service instantiation to either continue running and wait for more tasks to hit the task list (?true?), or to terminate after the task list is completed (?false?)

wdennis
2017-12-23 02:33
- Stage maps have a normal ?happy path? that is named ?Success? (i.e. keep going and process further tasks as they show up.) Alternate paths are ?failure? (something went wrong in the task, so terminate the runner and report an error), ?Reboot? (get to the end of the current stage?s task list, then reboot the machine and pick up processing more tasks thereafter), ?Stop? (after the current stage?s task list is completed, act as if the stage has RunnerWait == ?false?, even if it is actually ?true?) _are the last two def?s correct?_

greg
2017-12-23 02:41
Stop is exit regardless of task list. `Change stage and then stop` This leaves tasks in place to run next time the runner starts.

wdennis
2017-12-23 02:55
And the thing that starts the runner after it is stopped by finish-install:Stop in the krib stage-map is the runner service starting up after the OS install process completes and Linux starts the services for the ?normal run level??

greg
2017-12-23 04:02
yes

wdennis
2017-12-23 04:03
I do believe I?m beginning to understand?

wdennis
2017-12-24 14:06
:christmas_tree:Merry Christmas :gift: to the DR community and RackN folks!

shane
2017-12-24 15:11
Thx, @wdennis - to you and yours, too!

i.grischott
2017-12-24 18:44
@wdennis thanks. all the best for you and your family. :christmas_tree:

spector
2017-12-24 18:46
If you are bored this XMas and need a break - check out Rob?s last Podcast of the year reviewing his 2017 predictions and 2018 thoughts/guesses? Have a Merry Christmas and Happy News Years! http://bit.ly/2BxEN8p

wdennis
2017-12-24 19:07
This can wait until after the holiday?.

wdennis
2017-12-24 19:07
But, having some KRIB problems.

wdennis
2017-12-24 19:09
First pass with 4 servers worked as far as the KRIB install, but the k8s installed was broken?

wdennis
2017-12-24 19:09
```$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-etcd-hwwn2 1/1 Running 0 1d kube-system calico-kube-controllers-d6c6b9b8-2lvz4 1/1 Running 0 1d kube-system calico-node-bpzp2 2/2 Running 0 1d kube-system calico-node-mr6hx 2/2 Running 0 1d kube-system calico-node-vnscf 2/2 Running 1 1d kube-system calico-node-x6g2b 2/2 Running 0 1d kube-system etcd-testnode03 1/1 Running 0 1d kube-system kube-apiserver-testnode03 1/1 Running 0 1d kube-system kube-controller-manager-testnode03 1/1 Running 0 1d kube-system kube-dns-6f4fd4bdf-5lzmh 2/3 CrashLoopBackOff 589 1d kube-system kube-proxy-67745 1/1 Running 0 1d kube-system kube-proxy-6nmh7 1/1 Running 0 1d kube-system kube-proxy-lpc79 1/1 Running 0 1d kube-system kube-proxy-ztkp2 1/1 Running 0 1d kube-system kube-scheduler-testnode03 1/1 Running 0 1d kube-system kubernetes-dashboard-7b7b5cd79b-pvj6w 0/1 CrashLoopBackOff 286 1d ```

wdennis
2017-12-24 19:10
As you can see, kube-dns and kubernetes-dashboard pods could not start correctly

wdennis
2017-12-24 19:11
So after some stabs at troubleshooting (I?m too new to k8s to effectively t?shoot without help, which given holiday time was naturally in short supply?) I thought I?d just re-deploy the nodes again and hope 2nd time was the charm

wdennis
2017-12-24 19:12
So I did that, and the KRIB install did not complete successfully this time.


wdennis
2017-12-24 19:14
The chosen master machine had a failure in the krib-install stage?

wdennis
2017-12-24 19:15

wdennis
2017-12-24 19:15
And this was the message in the job that failed:

wdennis
2017-12-24 19:16

wdennis
2017-12-24 19:19
Looks like it tried to run kubeadm twice on the master node? Why would that be?

wdennis
2017-12-24 19:21
Overview is that I?m trying to do a KRIB run that installs U16.04 OS and then goes thru rest of install process (Docker and kubeadm run)

2017-12-24 20:19
ok... hell.... is the rebar kubernetes deployment working now ?

2017-12-24 20:24
meanwhile im struggling to get through a ubuntu or debian install with the same message.... no root file system is defined..... HELP!

greg
2017-12-25 00:50
@wdennis - you have to delete all the parameters except the krib-cluster-profile parameters to start over.

greg
2017-12-25 00:50
Kubeadm is fragile.

greg
2017-12-25 00:52
@outbackdingo - what you booting? Make sure it has at least 2gb of memory and a disk.

greg
2017-12-25 00:53
Specifically/dev/sda. If no sda, you need more parameters specified

shane
2017-12-25 00:56
@wdennis - the better pattern is to destroy all config related to krib - then recreate from scratch - as @greg says - kubeadm is extremely fragile

2017-12-25 05:41
theres definitely a 12Gb disk added to the profile for XOA, and its set to pxe boot

greg
2017-12-25 16:40
Does it have a ?usb? drive or does it show up differently Can you boot into sledgehammer and run lsblk

2017-12-25 16:40
i can try that

2017-12-25 16:40
its not usb is basically an XenServer vm

2017-12-25 16:43
though how do i login to the image ?

2017-12-25 16:49
@rackneng whats the password to login

greg
2017-12-25 17:08
root rebar1

wdennis
2017-12-25 18:18
Got my KRIB Christmas miracle! ``` $ kubectl get nodes NAME STATUS ROLES AGE VERSION testnode01 Ready <none> 3m v1.9.0 testnode02 Ready <none> 3m v1.9.0 testnode03 Ready master 15h v1.9.0 testnode04 Ready <none> 3m v1.9.0 $ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-etcd-k7qm9 1/1 Running 0 15h kube-system calico-kube-controllers-d6c6b9b8-d8hbs 1/1 Running 0 15h kube-system calico-node-dxfxt 2/2 Running 0 3m kube-system calico-node-gpl57 2/2 Running 1 3m kube-system calico-node-mfx87 2/2 Running 1 3m kube-system calico-node-njvdp 2/2 Running 0 15h kube-system etcd-testnode03 1/1 Running 0 15h kube-system kube-apiserver-testnode03 1/1 Running 0 15h kube-system kube-controller-manager-testnode03 1/1 Running 0 15h kube-system kube-dns-6f4fd4bdf-bjtsb 3/3 Running 0 15h kube-system kube-proxy-2hs25 1/1 Running 0 3m kube-system kube-proxy-c9k8b 1/1 Running 0 3m kube-system kube-proxy-dz2sj 1/1 Running 0 15h kube-system kube-proxy-lslvc 1/1 Running 0 3m kube-system kube-scheduler-testnode03 1/1 Running 0 15h kube-system kubernetes-dashboard-7b7b5cd79b-nhgw8 1/1 Running 0 15h $ ``` :christmas_tree: :santa:

2017-12-25 18:32
@rackneng yupp lsblk shows the VM has a 12G xvda drive

2017-12-25 18:32
@rackneng i want a KRIB under my trrr also ... :P

wdennis
2017-12-25 18:36
The runner never came back online after the OS install on testnode[01,02,04], so had to ?goose it? via bulk actions?

greg
2017-12-25 23:10
@outbackdingo you have a problem. The OS installs default to /dev/sda. You are using a Xen optimized drive.

greg
2017-12-25 23:11
You will need to add a parameter to all the machines like this.

greg
2017-12-25 23:11
Just a second ....

greg
2017-12-25 23:14
@outbackdingo - in the `part-scheme-default.tmpl` template for ubuntu and debian, it defines the drive and default lvm partition scheme.

greg
2017-12-25 23:15
You will see a variable called `operating-system-disk`. If it is unspecified, it will default to `sda`. Check turns into /dev/sda.. YOu should be able to set the string parameter, `operating-system-disk` to`xvda`.

greg
2017-12-25 23:16
`xvda`

greg
2017-12-25 23:16
`xvda`

greg
2017-12-25 23:16
sigh - I wish I could type.

greg
2017-12-25 23:17
centos is a little more forgiving. It will attempt to use the first disk. Or you can specific the same parameter as the same short name. It will attempt to use it.

greg
2017-12-25 23:19
You can put this in the global profile, or set it directly on the machine, or a new profile and add the profile to the machine.

greg
2017-12-25 23:19
The global profile from the cli would work like this:

greg
2017-12-25 23:19
`drpcli profiles set global param operating-system-disk to xvda`

greg
2017-12-25 23:20
This would make it global for all machines, but may be good enough for now.

2017-12-26 07:44
@rackneng ok, well that would be nice, however it break baremetal and possibly others at that point so i guess it needs to only apply to debian / ununtu

2017-12-26 07:56
is there no way to define "sdX, xvdX"

vlowther
2017-12-26 14:31
What is xvd?

shane
2017-12-26 15:05
@outbackdingo - if you need to apply the `xvd` disk to just a select set of Machines - then create a new profile and add the param as @greg mentioned to that new profile. Now add that profile to the machines you want to utilize the `xvd` disk type.

shane
2017-12-26 15:07
If you need more complex logic for selecting a specific disk (if they don't all show up as `/dev/xvda`), then you may need to clone the appropriate BootEnv and Templates, and modify the Template to be correct for your environment ....

zehicle
2017-12-26 16:13
@wdennis session time out is top of my list for this week.

2017-12-26 18:13
@rackneng xvdX is an XenServer vhd created with XenClient or XOA .... first being xvda disk, xvbd disk and so on.....

shane
2017-12-26 18:13
yep - we got that

2017-12-26 18:14
just clarifying for vlowther

2017-12-26 18:15
and thanks for the input, ill try to work through creating a a new profile for xen debian/ubuntu systems

greg
2017-12-26 18:17
A new task could be run during discovery to determine the OS install disk and write as a parameter on the machine?s parameter space.

greg
2017-12-26 18:17
This would handle this case generally. It could also be injected after raid configuration to pick up new named / marked drives. Things like that.

greg
2017-12-26 18:18
For quick and dirty testing, just set `operating-system-disk` to `xvda` in the global profile and see what happens.

2017-12-26 18:21
Newbie question: just start "Quick Start" and failed on step: $ ./drpcli bootenvs uploadiso sledgehammer Error: GET: bootenvs/sledgehammer: Not Found What can be wrong? Logs/dir-trees at http://wiki.4intra.net/User:StasFomin/Bugs/rebar

shane
2017-12-26 18:21
did you install the DRP Endpoint with "--no-content" flag ?

shane
2017-12-26 18:22
you need the `drp-community-content` to be installed for the bootenv to exist

2017-12-26 18:22
No. I follow «««« # Run the following commands to start up dr-provision in a local isolated way. # The server will store information and serve files from the ./drp-data directory. sudo ./dr-provision --static-ip=<IP of an Interface> --file-root=`pwd`/drp-data/tftpboot --data-root=`pwd`/drp-data/digitalrebar > dr-provision.loc 2>&1 & # Once dr-provision is started, the following commands will install the # 'BootEnvs'. Sledgehammer is needed for discovery and other features, # you can choose to install one or both of Ubuntu or Centos ./drpcli bootenvs uploadiso sledgehammer ./drpcli bootenvs uploadiso ubuntu-16.04-install ./drpcli bootenvs uploadiso centos-7-install »»»»»»»

shane
2017-12-26 18:24
can you please run (make sure you have `jq` installed first): ```./drpcli bootenvs list | jq '.[].Name'```

2017-12-26 18:24
stas@stas-custis-desktop ~/apps/rebar $ ./drpcli bootenvs list | jq '.[].Name' "ignore" "local"

shane
2017-12-26 18:24
yes - the `drp-community-content` is not installed for some reason

shane
2017-12-26 18:24
have you run the UX ?


shane
2017-12-26 18:26
good - you need to accept the self-signed HTTPS certificate

shane
2017-12-26 18:27
then log in to the Endpoint - if you haven't changed the user/pass credentials - just hit "defaults" then log in ...

2017-12-26 18:27
Yes, but after restarting service if fails with cert. First time was OK..


shane
2017-12-26 18:29
go to `Content Packages` in the left navigation bar

shane
2017-12-26 18:30
do you see `drp-community-content` in the right panel ?

shane
2017-12-26 18:30
if so - click "Transfer"

2017-12-26 18:30
Yes, click OK.

2017-12-26 18:31
Now ./drpcli bootenvs uploadiso looks working...

shane
2017-12-26 18:31
yay!

shane
2017-12-26 18:31
for some reason, your `drp-community-content` wasn't installed

2017-12-26 18:31
(not failing during several seconds). Thanks you very much!

shane
2017-12-26 18:31
do you have a terminal session output of the install process ?

shane
2017-12-26 18:31
if possible, we'd like to see why the content wasn't installed ... if there's a problem of some sort we haven't seen before

2017-12-26 18:33
http://wiki.4intra.net/User:StasFomin/Bugs/rebar#Install_and_run

2017-12-26 18:34
I installed on Mandriva-Mageia like distro, ROSA Desktop Fresh with urpmi package manager.

2017-12-26 18:36
First time I installed in temp folder "bred" (I thought that it is temp folder, not target folder for bash-curl install), then I install once more in another specific folder.

shane
2017-12-26 18:37
??? - my google translate says this is Bulgarian, and "objective" - which I assume means "done" ?

2017-12-26 18:37
Sorry, it is russian "OK"

shane
2017-12-26 18:38
yeah, I thought Translate was wrong - since I thought the rest of your characters looked Russian to me ... :slightly_smiling_face:

2017-12-26 18:39
md5sum I thing all OK. bsdtar was missing and I install it. No other warnings...

shane
2017-12-26 18:39
it looks like you might have started dr-provision pointing to a directory where it was NOT installed

shane
2017-12-26 18:40
so it started up, looking for the content in `--base-root=/data/rebar` - which was not the install location (`--base-root=/home/stas/apps/rebar/drp-data`)

2017-12-26 18:40
Hmm. I thought that -"-base-root=" should be data folder... separated from installation...

2017-12-26 18:40
Sorry.

shane
2017-12-26 18:41
no worries - we'll make a note on that in the quickstart document to hopeful clear up any confusion for others

shane
2017-12-26 18:41
In the "isolated" mode, the content is installed in to the `drp-data` directory, in the Current Working Directory from where you ran the install

shane
2017-12-26 18:42
that's where the `--base-root` should point to - where the "content" and all of the runtime assets get installed to

2017-12-26 18:42
Quickstart told me about "--file-root" and "--data-root" but installer only about "--base-root"

shane
2017-12-26 18:43
hmm - I'll re-read that and see if I can clear up the documentation some - thanks for the feedback

2017-12-26 18:44
I thought, that data-root shoult point to partition with lot of space for ISOs... but rebar installed on small home partion... so I point to large partition ?

shane
2017-12-26 18:46
yes, ISOs will be installed in the `drp-data/tftpboot/` directory - you could shut down `dr-provision` service, move everything, then start it back up with a correctly adjusted `--base-root` flag

shane
2017-12-26 18:47
otherwise, the more detailed `install` documentation might help clarify the install and startup options, as opposed to the "quickstart" (which is designed to be short and concise-ish)


2017-12-26 18:49
Yes, of course I will study it, but now I wish to complete "Quick Start" and bootstrap some "bearmetal".

2017-12-26 18:49
Thank you once more!

shane
2017-12-26 18:49
:slightly_smiling_face:

shane
2017-12-26 18:49
no problem - let us know how your "bearmetal" experience goes !

2017-12-26 21:01
Some notes about QuickStart: _ line with drpcli bootenvs list | jq ?.[].Name? ? typo, incorrect quotes _ Nothing about that users must create subnet through GUI (or something else) to start DHCP server worked. (I thought that local DHCP worked by default and spent a lot of time trying to debug constantly crashing dhcpdump) * May be all commands "drpcli" should be "./drpcli"?

2017-12-26 21:06
After settings: drpcli prefs set unknownBootEnv discovery defaultBootEnv sledgehammer defaultStage discover I saw, that my box loading sledgehammer by PXE But after drpcli prefs set unknownBootEnv discovery defaultBootEnv centos-7-install defaultStage discover (without rebooting dr-provision) I saw, that the testbox still loading sledgehammer. It is bug or feature? What's wrong?

zehicle
2017-12-26 22:46
@belonesox the default stage overrides the default bootenv setting in this case. you'd need to have no default stage or change the stage to centos-7.

greg
2017-12-27 03:44
The default values are used when a machine is created without a value.

greg
2017-12-27 03:45
Once a machine is created, if you want to install the os, you need to change the stage and reboot the node.

zehicle
2017-12-27 03:51
@belonesox - if you would like, we can give you a slack account. http://rackn.com/slack

2017-12-27 14:14
The problem is that newly created vboxes still loaded with sledgehammer. I tried to change bootenv but failed: stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines list | grep UUI "UUID": "2AE16CA9-4A90-41A3-90D9-C23DBFD78F57", "UUID": "FD352D17-EFC3-4297-85CA-0CCBDD37F8CA", stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines bootenv 2AE16CA9-4A90-41A3-90D9-C23DBFD78F57 centos-7-install Error: GET: machines/2AE16CA9-4A90-41A3-90D9-C23DBFD78F57: Not Found stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines bootenv FD352D17-EFC3-4297-85CA-0CCBDD37F8CA centos-7-install Error: GET: machines/FD352D17-EFC3-4297-85CA-0CCBDD37F8CA: Not Found stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines list | grep UUI "UUID": "2AE16CA9-4A90-41A3-90D9-C23DBFD78F57", "UUID": "FD352D17-EFC3-4297-85CA-0CCBDD37F8CA", stas@stas-custis-desktop /data/app/rebar $

greg
2017-12-27 15:29
@belonesox - newly create vboxes will boot sledgehammer unless you change the `defaultKnownStage` to `centos-7-install`.

greg
2017-12-27 15:29
sorry `defaultStage`

greg
2017-12-27 15:31
actually, you have to build a workflow to get this to work the way you are maybe describing.

greg
2017-12-27 15:35
What are you trying to do? You may want to watch some of @zehicle?s youtube videos on workflows. Or @shane?s 5min scripts and docs. To automatically install machines from discovery, you need to go through discover/sledgehammer to centos-7-install with a reboot. You can code this into a workflow change-stage/map. YOu can get close without workflows, but setting `defaultStage` to `none`, set the `defaultUnknownBootenv` to `discover` and the `defaultKnownBootEnv` to `centos-7-install`.

greg
2017-12-27 15:36
Once the machine is discover, reboot it. It should then go through centos-7-install process.

greg
2017-12-27 15:36
You basically have to discover the machine and then reboot it. Workflows can do the reboot.

2017-12-27 15:36
I try to archive simple task: all boxes in subnet will load centos 7 with specific kiskstart file/

greg
2017-12-27 15:36
otherwise, you need something to do it.

2017-12-27 15:37
First, still can force all boxes load centos. Second, I see dead link from http://digital-rebar.readthedocs.io/en/latest/user/cases/add_operating_system.html#ug-uc-edit-bootenv to https://github.com/rackn/digitalrebar-deploy/blob/master/containers/provisioner/update-nodes/templates/centos-7.ks.tmpl

greg
2017-12-27 15:38
yeah - those moved. Need to update the docs. Thanks for finding it.

2017-12-27 15:49
I have: stas@stas-custis-desktop /data/app/rebar $ ./drpcli prefs list { "baseTokenSecret": "PbddMkbveOfSB-0hlx8oRjzxA36ahC_A", "debugBootEnv": "2", "debugDhcp": "2", "debugFrontend": "1", "debugPlugins": "0", "debugRenderer": "2", "defaultBootEnv": "centos-7-install", "defaultStage": "centos-7-install", "knownTokenTimeout": "3600", "systemGrantorSecret": "RHdms47MX1cK6xDJMsB5YiwZFMXqiVm7", "unknownBootEnv": "discovery", "unknownTokenTimeout": "600" }

2017-12-27 15:49
but newly created boxes still loading sledgehammer.

2017-12-27 15:51
What am I doing wrong?

zehicle
2017-12-27 15:51
@belonesox - unknown - discovery

zehicle
2017-12-27 15:51
are these machines new or already created?

zehicle
2017-12-27 15:52
you may have to set unknown to centos-7 too.

2017-12-27 15:52
stas@stas-custis-desktop /data/app/rebar $ ./drpcli prefs set unknownBootEnv centos-7-install Error: POST: prefs: BootEnv centos-7-install cannot be used for the unknownBootEnv stas@stas-custis-desktop /data/app/rebar $

2017-12-27 15:52
Now I create new boxes with VirtualBox.

zehicle
2017-12-27 15:54
have you been able to get it working if you set the stage?

2017-12-27 15:56
Sorry, set the stage for what?

2017-12-27 16:00
I see that _ I can not set unknownBootEnv to centos-7-install _ I can not change boot env for specific machine: stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines list | grep UUI "UUID": "2AE16CA9-4A90-41A3-90D9-C23DBFD78F57", "UUID": "FD352D17-EFC3-4297-85CA-0CCBDD37F8CA", stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines bootenv 2AE16CA9-4A90-41A3-90D9-C23DBFD78F57 centos-7-install Error: GET: machines/2AE16CA9-4A90-41A3-90D9-C23DBFD78F57: Not Found stas@stas-custis-desktop /data/app/rebar $

greg
2017-12-27 16:01
@belonesox, run this: `drpcli bootenvs list | grep Name`

greg
2017-12-27 16:02
Actually, you need to see if the `centos-7-install` bootenv and stage are available.

2017-12-27 16:02
https://kopy.io/Av5V7

zehicle
2017-12-27 16:02
@belonesox I'm giving bad advice.... the os*-install requires the machines to be defined. It would be like having the MACs defined in cobbler

greg
2017-12-27 16:03
@belonesox, `./drpcli bootenvs show centos-7-install`

2017-12-27 16:04
https://kopy.io/kdgJM

greg
2017-12-27 16:04
It is available and valid.

greg
2017-12-27 16:07
@belonesox, `./drpcli machines stage 2AE16CA9-4A90-41A3-90D9-C23DBFD78F57 none --force`

greg
2017-12-27 16:07
`./drpcli machines bootenv 2AE16CA9-4A90-41A3-90D9-C23DBFD78F57 centos-7-install --force`

2017-12-27 16:07
stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines stage 2AE16CA9-4A90-41A3-90D9-C23DBFD78F57 none --force Error: GET: machines/2AE16CA9-4A90-41A3-90D9-C23DBFD78F57: Not Found

greg
2017-12-27 16:07
umm -

greg
2017-12-27 16:08
`./drpcli machines list`

2017-12-27 16:08
https://kopy.io/n4rps

greg
2017-12-27 16:09
You are using the uuid of the motherboard, you Uuid - ae271a28-9a57-4f21-bfef-1b8d8c27e9e2

greg
2017-12-27 16:10
`Uuid` is the machine field.

greg
2017-12-27 16:11
The UUID you are using is the motherboard?s UUID. It is not guaranteed to be set or unique across all systms. THough for VirtualBox it is the VirualBox id. Our virtual box plugin grabs that.

greg
2017-12-27 16:11
I need to be away for a while. I?ll be back on later. Happy Holidays.

2017-12-27 16:12
So how to force centos-7-install for all by default?

zehicle
2017-12-27 16:29
for new machines, you can build a workflow that moves from discovery into centos-7 install as an automatic workflow. once the machine is discovered/know then you can just reset it back to centos-7.

2017-12-27 16:31
:( By hand? Possible to automate it? Why default env can not be just Centos-7

zehicle
2017-12-27 16:32
no, that's that workflows do

zehicle
2017-12-27 16:33
workflows move between stages automatically

zehicle
2017-12-27 16:33
"stagemap" parameter in the profile


2017-12-27 16:34
I see this in UI

zehicle
2017-12-27 16:35
what's your endpint version?

zehicle
2017-12-27 16:35
DRP version

zehicle
2017-12-27 16:36
`drpcli info get`


2017-12-27 16:36
https://kopy.io/njOQ3

zehicle
2017-12-27 16:37
hmmm, you should be able to see workflow page

zehicle
2017-12-27 16:37
let me check something

zehicle
2017-12-27 16:45
iI don't get that my v3.4 test right - trying a v3.5 install

zehicle
2017-12-27 16:45
you may need to refresh the page

2017-12-27 16:45
OK

2017-12-27 16:46
Page OK

2017-12-27 16:46
possible to create desired workflow by drpcli commands?

shane
2017-12-27 16:49
@belonesox - yes - the workflow can be created by adding a stagemap to the `global` profile, then any Machine booted/discovered by DRP Endpoint will be installed with your OS of choice (eg CentOS 7) - give me one minute, I'll dig up the CLI I use in my `5min-demo` harness

2017-12-27 16:52
Now I have the workflow


2017-12-27 16:52
But newly created boxes still loaded by sledgehammer (hate it already

zehicle
2017-12-27 16:53
I've duplicated the "workflow page is now allowed until after refresh" -> we'll see about how to fix

shane
2017-12-27 16:53
steps are: 1) create a JSON blob w/ the `change-stage/map` Parameter with the workflow you desire 2) add the Parameter to the `global` profile 3) boot a NEW machine - or force an existing Machine to re-run through workflow create JSON blob is as follows ```cat <<EOFPARAM > stagemap-param.json { "discover": "centos-7-install:Reboot", "centos-7-install": "ssh-keys:Success", "ssh-keys": "complete-nowait:Success" } EOFPARAM``` Now apply the Param to the `global` profile: ```drpcli profiles set global param change-stage/map to - < stagemap-param.json```

shane
2017-12-27 16:53
(sorry - let me remove the Packet specific stages from there - hold on)

shane
2017-12-27 16:55
In case you can't see edits - the correct JSON blob is: ```cat <<EOFPARAM > stagemap-param.json { "discover": "centos-7-install:Reboot", "centos-7-install": "ssh-keys:Success", "ssh-keys": "complete-nowait:Success" } EOFPARAM```

shane
2017-12-27 16:55
this will also inject SSH keys in to the CentOS 7 installed node for you - if they are defined

shane
2017-12-27 16:57
`sledgehammer` is an extremely useful tool - it allows us to do some very interesting advanced workflow installations - and it also allows us to implement "immutable infrastructure" practices ... we typically don't see an installation requirement that is a simple "slap centos 7 on everything" - usually infrastructure we encounter has much more complex requirements

2017-12-27 16:57
so profiles not workflows need for this?

shane
2017-12-27 16:57
Profiles are used by workflow to define the stages to advance a Machine through

2017-12-27 16:57
can I specify my kickstart files in this params?

shane
2017-12-27 16:58
to use your own custom Kickstart, you'll want to "clone" an existing BootEnv (for example the existing centos-7-install) BootEnv

shane
2017-12-27 16:58
the you'll modify it to use a different Kickstart than our community provided kickstart

shane
2017-12-27 16:58
BUT

shane
2017-12-27 16:59
lets start w/ small steps - lets get you booting CentOS 7 installs first

shane
2017-12-27 16:59
lets get that working - and then we can evaluate how you modify the Kickstart to be customized for your needs

2017-12-27 17:00
stas@stas-custis-desktop /data/app/rebar $ ./drpcli profiles set global param change-stage/map to - < stagemap-param.json { "centos-7-install": "ssh-keys:Success", "discover": "centos-7-install:Reboot", "ssh-keys": "complete-nowait:Success" }

2017-12-27 17:00
But still .... SLEDGEHAMMER

shane
2017-12-27 17:00
how are you booting your Machines - did the machine exist already - or is it brand new ?

2017-12-27 17:01
brand new created VBox

shane
2017-12-27 17:01
if it existed already - it's already been "discovered" and you'll never match the workflow to kick off the work flow process

shane
2017-12-27 17:01
please provide output of `drpcli profiles show global`


2017-12-27 17:02
Every test - newly created. May be I should restart drp-service?

shane
2017-12-27 17:02
shouldn't be necessary

2017-12-27 17:02
stas@stas-custis-desktop /data/app/rebar $ ./drpcli profiles show global { "Available": true, "Description": "Global profile attached automatically to all machines.", "Errors": [], "Meta": { "color": "blue", "icon": "world", "title": "Digital Rebar Provision" }, "Name": "global", "Params": { "change-stage/map": { "centos-7-install": "ssh-keys:Success", "discover": "centos-7-install:Reboot", "ssh-keys": "complete-nowait:Success" } }, "ReadOnly": false, "Validated": true }

shane
2017-12-27 17:12
for the last Machine you booted that you said failed - can you please pull and post the Jobs logs for that machine? To do it from the CLI - get the Machines UUID, and replace it as appropriate in the below `UUID` shell variable: ```export UUID=<uuid_here> ./drpcli jobs list | jq ".[] | select(.Machine==\"$UUID\")"```

2017-12-27 17:16
Last machine stated loading sledgehammer so I halt it. Looks like drp not know about last machine: stas@stas-custis-desktop /data/app/rebar $ VBoxManage list vms | grep test-rebar "test-rebar" {fd352d17-efc3-4297-85ca-0ccbdd37f8ca} "test-rebar-centos" {2ae16ca9-4a90-41a3-90d9-c23dbfd78f57} "test-rebar-centos-2" {4abb1333-ee36-4cec-8de4-5d0002f25a31} "test-rebar-centos-4" {ff8c3a52-fde6-4c59-bab4-7405ae2fb3be} "test-rebar-centos-5" {a825f944-e3d3-4ec0-b0bc-8aae6096ea0d} "test-rebar-centos-6" {25daff02-af44-4f2c-b60d-04b351ff3933} stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines list | grep Uuid "Uuid": "90d9a5cd-bb35-4edc-81d8-b3a349099d14", "Uuid": "9cbe8582-d713-4361-bca9-f11eb2bd0b85", "Uuid": "ae271a28-9a57-4f21-bfef-1b8d8c27e9e2", stas@stas-custis-desktop /data/app/rebar $

2017-12-27 17:17
so ./drpcli jobs list | jq ".[] | select(.Machine==\"25daff02-af44-4f2c-b60d-04b351ff3933\")" return nothing

shane
2017-12-27 17:20
you need a machine UUID from Rebar - not from virtual box - they're different things

shane
2017-12-27 17:20
so it'd be either of the `90d...` , `9cb...`, or `ae2...` IDs

2017-12-27 17:30
Sorry, cannot find/match appropriate VBox "test-rebar-centos-6" and drp machines. See https://kopy.io/zANQF Also no drpcli machines match MAC 080027ADB4C3

2017-12-27 17:42
Create yet another "test-rebar-centos-7" https://vimeo.com/195094441 Wait until booted. stas@stas-custis-desktop /data/app/rebar $ ./drpcli machines list | grep Uuid "Uuid": "42620a9a-a375-417c-a5c5-6ec694aeb9a0", "Uuid": "90d9a5cd-bb35-4edc-81d8-b3a349099d14", "Uuid": "9cbe8582-d713-4361-bca9-f11eb2bd0b85", "Uuid": "ae271a28-9a57-4f21-bfef-1b8d8c27e9e2", Wow! This "test-rebar-centos-7" shurely must be "42620a9a-a375-417c-a5c5-6ec694aeb9a0" in DRP But stas@stas-custis-desktop /data/app/rebar $ ./drpcli jobs list | jq ".[] | select(.Machine==\"42620a9a-a375-417c-a5c5-6ec694aeb9a0\")"

2017-12-27 17:49
current machines list ? https://kopy.io/jeDUY

2017-12-27 17:50
drpcli jobs list ? https://kopy.io/yIpLz

wdennis
2017-12-27 18:08
- does v3.5.0 have the "dynamic refresh" of machine state thing?

shane
2017-12-27 18:08
can you pull just the job log for the Machine UUID starting with `90d...`

shane
2017-12-27 18:08
@wdennis that's a UX side feature, and it is in the current UX version

wdennis
2017-12-27 18:08
Doesn't seem to be working on my DRP

wdennis
2017-12-27 18:09
(stable 3.5.0)

shane
2017-12-27 18:09
it's also only in certain fields - not all fields/pages fully "dynamically refresh"

wdennis
2017-12-27 18:09
I thought "Bulk Actions" did?

shane
2017-12-27 18:10
Bulk Actions fields in the table should dynamically refresh, yes

wdennis
2017-12-27 18:10
Hmmm, it's not for me (until I click the top "Refresh" button)

shane
2017-12-27 18:13
"Works For Me" :tm:

shane
2017-12-27 18:15
(unless is broke in the last 3 or 4 days - I haven't tested it since before xmas)


wdennis
2017-12-27 18:16
(j/k - and yes I know it's Beta!)

shane
2017-12-27 18:17
we have a lot of UX fixes going in this week ... @meshiest is back and he and @zehicle have been hammering on some of the pernicious bugs and features

zehicle
2017-12-27 18:19
@wdennis you should be able to see events scrolling in the event viewer

zehicle
2017-12-27 18:19
those should be reflected live on lists

zehicle
2017-12-27 18:19
if you don't have events on the view, then they won't make it anywhere else

wdennis
2017-12-27 18:19
Oh also @shane - what's involved with enabling alternate kickstart/preseed templates in a profile? Lots of work?

wdennis
2017-12-27 18:20
@zehicle Let me check...

2017-12-27 18:20
@shane All logs for 90d9a5cd-bb35-4edc-81d8-b3a349099d14 ? https://kopy.io/Xqbb4

shane
2017-12-27 18:20
not lots of work - just got back burnered and waiting on some normalization on OS family stage from ... @vlowther

wdennis
2017-12-27 18:20
@vlowther blocker

vlowther
2017-12-27 18:20
hm?

wdennis
2017-12-27 18:22
My feature request that @shane is working on he says is waiting on your work... ^^^


vlowther
2017-12-27 18:23
Sorrt, Christmas and per-request log threading have eaten my brain.

wdennis
2017-12-27 18:23
"My brain hurts!"

vlowther
2017-12-27 18:24
heh

vlowther
2017-12-27 18:24
@shane, that PR looks complete in and of itself.

shane
2017-12-27 18:24
the `templates/select-kickseed.tmpl` needs to be update to reflect `OS.Family` correctly to match the kickstart/seed appropriately

vlowther
2017-12-27 18:25
Ah.

shane
2017-12-27 18:25
we never resolved the if/eq statements to match

shane
2017-12-27 18:26
you said "something" needed changing to model that right ... or... something ... I forget, that was back a few weeks ago (by general rule, I never remember anything before lunch ... )

vlowther
2017-12-27 18:28
hm, don't see an issue about it, so it is probablly written down on the whiteboard in the office.

wdennis
2017-12-27 18:29
@zehicle Does the DRP-UX browser machine have to be reachable by the DRP server to enable the events viewer feature?

vlowther
2017-12-27 18:30
No more than it has to be reachable for any other UX interaction.

zehicle
2017-12-27 18:30
no, it's one way

wdennis
2017-12-27 18:30
OK. It was not working when the browser (on my laptop) was on my home network

vlowther
2017-12-27 18:30
The UX recieves events by opening a websocket on the server (which is glorified TCP over HTTP), then the server pushes events over the open channel.

wdennis
2017-12-27 18:31
I access work thru a site-to-site VPN tunnel, but work's network does not have a route to my home network...

vlowther
2017-12-27 18:31
websockets hijack the HTTP session

wdennis
2017-12-27 18:32
So if the DRP-server machine was trying to intiate traffic to my laptop on my home network, then I could see how it could fail

vlowther
2017-12-27 18:32
It is not.

vlowther
2017-12-27 18:32
All the initiation happens on the client side.

shane
2017-12-27 18:32
(client in this case is your web browser)

wdennis
2017-12-27 18:32
OK. Wonder why it wasn't working?

wdennis
2017-12-27 18:33
I'm back at work, and it is working now.

vlowther
2017-12-27 18:33
Any one of a frankly ridiculous number of reasons

wdennis
2017-12-27 18:33
Bah, computers!

shane
2017-12-27 18:34
the answer is to just never leave work ....

wdennis
2017-12-27 18:34
Yes, that's it...

wdennis
2017-12-27 18:34
I'll try it again tonight

2017-12-27 19:14
@shane About my case, I sent you email how to ssh to my desktop... I am not sure if it possible to private communications in gitter?

shane
2017-12-27 19:27
you can join slack w/ us via requesting an invite at http://rackn.com/slack

shane
2017-12-27 19:27
not sure where your email went to

2017-12-27 19:27
shanev@gmail

shane
2017-12-27 19:27
nope - not me ... :slightly_smiling_face:

2017-12-27 19:27
aaaa

shane
2017-12-27 19:27

2017-12-27 19:31
Sent to this mail, also requested slack access.


2017-12-27 19:33
That is why I get wrong email.

shane
2017-12-27 19:34
dunno who "Shane Vitarana" is ... I'm "Shane Gibson"

shane
2017-12-27 19:34
there's only one true Shane - that's me - the rest are all imposters

2017-12-27 19:34
I just get, that this is not real gitter chat but replication from slack

2017-12-27 19:35
That is why I use standard gitter interface to get contacts ....

stanislav.fomin
2017-12-27 19:41
has joined #json

stanislav.fomin
2017-12-27 19:43
Hello, @shane, did you receive the mail, and can you login to my box?

shane
2017-12-27 19:43
@stanislav.fomin - welcome to our Slack - yes got email

shane
2017-12-27 19:45
all four of your machines show they are in centos-7-install bootenv

shane
2017-12-27 19:45
what makes you think they're still in sledgehammer ?

shane
2017-12-27 19:46
sledgehammer happens to ALSO be CentOS 7 as well - so the console prompt will look similar

stanislav.fomin
2017-12-27 19:47

stanislav.fomin
2017-12-27 19:48
last freshly created "test-rebar-centos-7"

stanislav.fomin
2017-12-27 19:48
Should I create another test-rebar-centos-8?

shane
2017-12-27 19:50
nope - do you have any other DHCP or provisioning systems on the same network ?

stanislav.fomin
2017-12-27 19:50
provisioning ? no. DHCP ? yes, but local DRP-DHCP usually processed requests faster

shane
2017-12-27 19:53
not guaranteed to be faster - so you could have conflict there

shane
2017-12-27 19:54
can you plz restart DRP - with the following: ```sudo ./dr-provision --static-ip=172.31.1.3 --base-root=/data/app/rebar/drp-data --local-content= --default-content= > /tmp/drp.log 2>&1 &```

stanislav.fomin
2017-12-27 19:54
Yes, it possible, but I think it does not matters for this "Centos7 instead of SH" problem

stanislav.fomin
2017-12-27 19:56
Hmm... stas@stas-custis-desktop /data/app/rebar $ sudo ./dr-provision --static-ip=172.31.1.3 --base-root=/data/app/rebar/drp-data --local-content= --default-content= > /tmp/drp.log 2>&1 & [1] 19701 [1]+ ??????????? sudo ./dr-provision --static-ip=172.31.1.3 --base-root=/data/app/rebar/drp-data --local-content= --default-content= > /tmp/drp.log 2>&1 stas@stas-custis-desktop /data/app/rebar $ ps -ef | grep dr-p root 19701 18083 0 22:55 pts/3 00:00:00 sudo ./dr-provision --static-ip=172.31.1.3 --base-root=/data/app/rebar/drp-data --local-content= --default-content= stas 19716 18083 0 22:55 pts/3 00:00:00 grep --color dr-p stas@stas-custis-desktop /data/app/rebar $ tail -f /tmp/drp.log

stanislav.fomin
2017-12-27 19:56
nothing in /tmp/drp.log

stanislav.fomin
2017-12-27 19:57
OK?

shane
2017-12-27 19:59
it's not running: ```test@stas-custis-desktop /data/app/rebar $ ./drpcli info get 2017/12/27 22:58:54 Error creating session: CLIENT_ERROR: Get https://127.0.0.1:8092/api/v3/users/rocketskates/token: dial tcp 127.0.0.1:8092: getsockopt: connection refused```

stanislav.fomin
2017-12-27 20:00
dr-provision2017/12/27 20:00:12.116728 Version: v3.5.0-0-8dd3ac9c62a2555d315e07f5a190f2230e3a7ca7 dr-provision2017/12/27 20:00:12.116766 Extracting Default Assets dr-provision2017/12/27 20:00:12.435654 Failed to render unknown bootenv: StartupError: bootenvs/centos-7-install: BootEnv centos-7-install cannot be used for the unknownBootEnv

shane
2017-12-27 20:01
ah - yes, we need to set our system preferences back

stanislav.fomin
2017-12-27 20:02
stas@stas-custis-desktop /data/app/rebar $ ./drpcli prefs set unknownBootEnv discovery 2017/12/27 23:01:50 Error creating session: CLIENT_ERROR: Get https://127.0.0.1:8092/api/v3/users/rocketskates/token: dial tcp 127.0.0.1:8092: getsockopt: connection refused stas@stas-custis-desktop /data/app/rebar $

stanislav.fomin
2017-12-27 20:02
?

shane
2017-12-27 20:03
:slightly_smiling_face: no DPR Endpoint to talk to - you can't change something not running

shane
2017-12-27 20:03
hold on - I'm finding the wiring for it in the config files

stanislav.fomin
2017-12-27 20:03
Check-and-mate

shane
2017-12-27 20:05
you'll need to modify 3 files in the filesystem - so they match the `"val":` JSON value like: ```defaultStage discover defaultBootEnv sledgehammer unknownBootEnv discovery```

shane
2017-12-27 20:05
files are in `drp-data/digitalrebar/preferences/`

shane
2017-12-27 20:05
unknownBootEnv.json defaultStage.json defaultBootEnv.json

zehicle
2017-12-27 20:06
welcome @stanislav.fomin

stanislav.fomin
2017-12-27 20:09
started ... OK /tmp/drp.log

shane
2017-12-27 20:10
do: `./drpcli prefs set defaultStage discover defaultBootEnv sledgehammer unknownBootEnv discovery`

shane
2017-12-27 20:10
(rather, I did ... that)

stanislav.fomin
2017-12-27 20:11
I should repeat this command?

shane
2017-12-27 20:11
I'd suggest destroying all of the current machines, wiping out the jobs logs, then starting over, so we have a clean setup - and no extra logs and machines cluttering things up

shane
2017-12-27 20:11
(no - I did it already)

stanislav.fomin
2017-12-27 20:11
OK, perform clean installation somethere else?

shane
2017-12-27 20:12
if you're ok w/ me wiping the machines and jobs log - I'll do that right now

shane
2017-12-27 20:12
you'll want to destroy the extra machines in VirtualBox

shane
2017-12-27 20:12
(or we can just reuse them - should be fine after destroying inside DRP)

stanislav.fomin
2017-12-27 20:13
Should I destroy all VMs test-rebar in VirtualBox?

shane
2017-12-27 20:13
no - I don't think you need to

shane
2017-12-27 20:14
we'll reuse the vbox nodes - just give me a minute to clean up the system

stanislav.fomin
2017-12-27 20:14
I think if I will make ./drpcli prefs set defaultStage discover defaultBootEnv centos-7-install unknownBootEnv discovery in clean virgin installation everything will be OK. But I am insterested how to fix this situation (from sledhammer to centos)?

shane
2017-12-27 20:15
```test@stas-custis-desktop /data/app/rebar $ for K in `./drpcli machines list | jq -r '.[].Uuid'`; do echo "NUKE: $K";./drpcli machines destroy $K; done NUKE: 42620a9a-a375-417c-a5c5-6ec694aeb9a0 Deleted machine 42620a9a-a375-417c-a5c5-6ec694aeb9a0 NUKE: 90d9a5cd-bb35-4edc-81d8-b3a349099d14 Deleted machine 90d9a5cd-bb35-4edc-81d8-b3a349099d14 NUKE: 9cbe8582-d713-4361-bca9-f11eb2bd0b85 Deleted machine 9cbe8582-d713-4361-bca9-f11eb2bd0b85 NUKE: ae271a28-9a57-4f21-bfef-1b8d8c27e9e2 Deleted machine ae271a28-9a57-4f21-bfef-1b8d8c27e9e2```

stanislav.fomin
2017-12-27 20:15
Shoud I halt all VMs in VirtualBox?

shane
2017-12-27 20:15
nope

shane
2017-12-27 20:16
```test@stas-custis-desktop /data/app/rebar $ for K in `./drpcli jobs list | jq -r '.[].Uuid'`; do ./drpcli jobs destroy $K; done Deleted job 0ee3101e-4fa6-47fe-b184-6cd416408280 <...snip...> Deleted job d4bd5d66-8dfb-463e-b37e-19afc26c8068```

shane
2017-12-27 20:16
(you can see the subsequent actions now in the `/tmp/drp.log` file - just for reference)

stanislav.fomin
2017-12-27 20:17
OK, dont worry for it

shane
2017-12-27 20:22
@stanislav.fomin go ahead and reboot one of your vbox nodes now

stanislav.fomin
2017-12-27 20:22
reboot or create new?

shane
2017-12-27 20:23
you should be able to just reboot an existing empty vbox node - and it'll DHCP/PXE off of DRP now

stanislav.fomin
2017-12-27 20:24
Aaaaa sledgehammer

stanislav.fomin
2017-12-27 20:24
on another new box


shane
2017-12-27 20:26
ok - we have forward progress ... ! but not success !

stanislav.fomin
2017-12-27 20:27
:disappointed:

shane
2017-12-27 20:27
it rebooted successfully based on the workflow to start the CentOS install - but it failed on "no bootable medium" to the Virtual Machine


stanislav.fomin
2017-12-27 20:28
VirtualBox cannot automatically select PXE

stanislav.fomin
2017-12-27 20:28
User have to press F12, select lan boot, etc....

stanislav.fomin
2017-12-27 20:29
Yes, after manual install it loads centos 7

stanislav.fomin
2017-12-27 20:29
right now....

shane
2017-12-27 20:30
yeah - I see it walking through the install right now

stanislav.fomin
2017-12-27 20:30
But this Automation should use manual actions... not OK

shane
2017-12-27 20:31
we don't automate VirtualBox - we automate bare metal (bearmetal ... remember that quip you made ... :slightly_smiling_face: )

stanislav.fomin
2017-12-27 20:31
Why is it not possible to get rid of sledgehammer and boot Centos 7 directly?

shane
2017-12-27 20:31
we do have a VirtualBox plugin - and I think it will do some of this automation - I don't play with it much

shane
2017-12-27 20:32
we don't do things that way - Sledgehammer is required to inventory, collect machine information, start a `runner` to process Jobs/Task to implement Workflow, etc ...

stanislav.fomin
2017-12-27 20:32
Hmm

shane
2017-12-27 20:32
it also allows us to provide protections about what should and shouldn't be imaged

shane
2017-12-27 20:33
if you just boot anything against a Provisioner that automatically installs an Operating System - you can very easily nuke something you didn't mean to

shane
2017-12-27 20:33
yes, you can control it by setting the Machine boot options (i.e. do not PXE boot) - but many production operations shops don't do it that way

shane
2017-12-27 20:34
Now - I _think_ that we can do it straight to CentOS install - if you create the Machine first - if the machine is created - before you first time boot it - it can then go straight to CentOS install

stanislav.fomin
2017-12-27 20:35
Yes, OK. So what the problem?

shane
2017-12-27 20:35
but that requires the `machine` object to be created first

stanislav.fomin
2017-12-27 20:35
What is right way to automatically recreate this workflow?

shane
2017-12-27 20:35
our view of Provisioning is based on creating workflows for environments with 1000s, 10000s, and more Machines

shane
2017-12-27 20:35
we have to be flexible to implement provisioning across huge infrastructure

shane
2017-12-27 20:36
we generally don't have shops that want to slap CentOS 7 only on machines straight out of the box

shane
2017-12-27 20:36
so - the problems was the `prefs` conflicting with the workflow (the "global" profile param I had you add)

shane
2017-12-27 20:37
once we set the `prefs` for the unknown/default BootEnv/Stage correctly - then the workflow to provision Machines to CentOS 7 install is working

shane
2017-12-27 20:37
the last piece you need is to "automate" the reboot actions within the VirtualBox environment

stanislav.fomin
2017-12-27 20:38
OK. What command line to create this "discover"->"centos7-install" workflow?

shane
2017-12-27 20:39
you did it already :slightly_smiling_face:

stanislav.fomin
2017-12-27 20:39
I did it by UI

stanislav.fomin
2017-12-27 20:39
So I wish to know two coherent magic spells: * workflow * global prefs

shane
2017-12-27 20:40
`prefs` is separate from `global` Profile

shane
2017-12-27 20:41
`prefs` sets the default conditions for Unknown and Default stages (workflow) and BootEnvs (install environments)

shane
2017-12-27 20:41
`global` Profile is applied to all machines that get installed

shane
2017-12-27 20:41
in this case, we used the `global` profile to specify that ALL machines should ALWAYS install ONLY centos-7-install workflow

shane
2017-12-27 20:42
in "real world use" - you'd normally `drpcli profiles create ... ` to create a new Profile for a given use case, then you'd apply this specific new Profile to a set of Machines

shane
2017-12-27 20:43
that Profile would specify the appropriate workflow stagemap (named `change-stage/map`)

shane
2017-12-27 20:43
other Machines might have *other* profiles with different settings - different BootEnvs, different Params, etc...

shane
2017-12-27 20:44
there are 2 more pieces you need to do 1. install the VirtualBox plugin 2. set `access-keys` to add SSH keys to inject in to the CentOS nodes that get installed

shane
2017-12-27 20:44
(because I'm assuming you want to be able to log in to them !)

stanislav.fomin
2017-12-27 20:44
« be able to log in to them» ? there is no default passwords in your kickstarts?

shane
2017-12-27 20:45
heck NO !!

stanislav.fomin
2017-12-27 20:45
ok-ok-ok

shane
2017-12-27 20:45
that'd be a huge security hole - imagine 200 machines in an environment that get installed with know user/pass pairs ... !!

stanislav.fomin
2017-12-27 20:46
now I wish to try baremetal install ? but for this I have to unplug ethernet cable from hub (to avoid DHCP-conflict)... end your ssh connection will drop. OK?

shane
2017-12-27 20:46
sure - but we should fix your DRP endpoint

shane
2017-12-27 20:47
so that you can get access-keys setup correctly first

shane
2017-12-27 20:48
basically - you want to inject the Public key half of a private/public key pair that you will use to SSH to the built machines

stanislav.fomin
2017-12-27 20:48
OK, what command for access-keys?

shane
2017-12-27 20:48
like this: ```drpcli profiles set global param access-keys to '{ "username": "ssh-rsa <...SSH_PUBLIC_KEY_HALF_HERE...> MY_KEY_NAME" }'```

shane
2017-12-27 20:49
replace `username`, `<...SSH_PUBLIC_KEY_HALF_HERE...>`, and `MY_KEY_NAME`

stanislav.fomin
2017-12-27 20:49
May be drpcli can grab it from github? (https://github.com/belonesox.keys)

shane
2017-12-27 20:49
not from github - but we have plugins that operate in other environments to get keys (for example in http://packet.net we get the SSH key half from the Packet metadata services)

stanislav.fomin
2017-12-27 20:51

shane
2017-12-27 20:52
the above `drpcli` command adds the `access-keys` Parameter to the _global profile_ which means any and all machines provisioned AFTER you add this param - will get the SSH key injected in to the `~root/.ssh/authorized_keys` file

shane
2017-12-27 20:54
I think that's fine - there's no "key name" added so it won't be documented in the authorized_keys file

shane
2017-12-27 21:11
last step is to install the VirtualBox plugin - if you want to continue to use the VirtualBox environment ```curl -s -o /tmp/virtualbox-ipmi-plugin https://s3-us-west-2.amazonaws.com/rebar-catalog/virtualbox-ipmi/v1.4.0-0-2bad605790ca75f85add0414ea2624684ae0a499/amd64/linux/virtualbox-ipmi ./drpcli plugin_providers upload virtualbox-plug from /tmp/virtualbox-ipmi-plugin echo '{ "Name": "virtualbox-ipmi", "Available": true, "Validated": true, "ReadOnly": false, "Provider": "virtualbox-ipmi", "Errors": [], "Params": { "virtualbox/user": "stas" } }' > /tmp/plug.json ./drpcli plugins create - < /tmp/plug.json ```

stanislav.fomin
2017-12-27 21:12
I successfully installed Centos7 on baremetal, thank you.

shane
2017-12-27 21:12
NOTE: the `virtualbox/user` is set to `stas` (I think that was your username)

shane
2017-12-27 21:12
it must be set to your username - and only works for a specific non-root user - that's a limitation of VirtualBox

stanislav.fomin
2017-12-27 21:13
Now I have to go (midnight in Moscow, subway will be closed soon...)

shane
2017-12-27 21:13
the plugin implements Power events (reboot, power on, power off), and nextboot (PXE/disk) stuff for automating VirtualBox more

shane
2017-12-27 21:13
glad to hear the bearmetal install was successful !!

stanislav.fomin
2017-12-27 21:13
last question: why UI (172.31.1.3:8092) cannot works locally, without internet access?

shane
2017-12-27 21:14
there is no embedded HTTP UI/UX on the DRP Endpoint

stanislav.fomin
2017-12-27 21:14
Only redirect to your service?

shane
2017-12-27 21:14
our SaaS/Portal is run by us, so we can apply updates, new features, etc.... at a MUCH faster pace than most people deploy new updates to the DRP Endpoing service locally

shane
2017-12-27 21:15
yes - but it's a single-page CORS application - which means it downloads the page to your browser and runs in browser - we do connect back to our Content Library system for getting content and managing Portal based logins

stanislav.fomin
2017-12-27 21:15
Yes, but I think about deploying to private clouds... without internet access

shane
2017-12-27 21:15
your DRP Endpoint only connects to YOUR laptop/browser session - it does NOT connect to our Portal

shane
2017-12-27 21:16
yes - and we have a roadmap plan for Enterprise customers to be able to deploy a local Portal inside their environment

shane
2017-12-27 21:16
we don't have that feature enabled yet - but it's on the road map for our paying enterprise customers

stanislav.fomin
2017-12-27 21:16
OK, thank you.

2017-12-28 14:36
so do we have a working KRIBs yet ??

shane
2017-12-28 14:43
oh yes - KRIB has been working for quite some time

zehicle
2017-12-28 15:27
@stanislav.fomin if stand alone portal is interesting then we should talk. It's a near term beta item that we could bundle into other support features.

stanislav.fomin
2017-12-28 16:10
It is not urgent, I will be at New Year's holidays until 11 Jan... may be sometimes


shane
2017-12-28 19:52
@stanislav.fomin - that is OLD (version 2) documentation, please use the "latest" doc version for Digital Rebar Provision, located at: http://provision.readthedocs.io/en/latest/README.html#

shane
2017-12-28 19:52
(Digital Rebar v2 is a fairly different beast from Digital Rebar Provision v3)

stanislav.fomin
2017-12-29 17:44
What is right way to specify bootenvs? I try to import bootenvs from YAMLs, $ drpcli bootenvs create --format=yaml - < /etc/digital-rebar/test.yaml Error: ValidationError: bootenvs/centos-7-install Templates[0]: No common template for default-pxelinux.tmpl Templates[1]: No common template for default-elilo.tmpl Templates[2]: No common template for default-ipxe.tmpl Templates[3]: No common template for centos-7.ks.tmpl and I cannot find to how export-import templates....

shane
2017-12-29 17:48
here's a valid work flow - showing how to copy an existing BootEnv, change it's Name (minimum required change), and then add it back to the system: ```drpcli bootenvs list | jq '.[].Name' | grep cent "centos-7-install" "centos-7.4.1708-install" drpcli bootenvs show centos-7-install --format=yaml > centos.yaml sed -i.bak 's/^\(Name: \).*$/\1my-centos/g' centos.yaml drpcli bootenvs create - < centos.yaml drpcli bootenvs list | jq '.[].Name' | grep cent "centos-7-install" "centos-7.4.1708-install" "my-centos"```

shane
2017-12-29 17:50
Your errors however, relate to validation

shane
2017-12-29 17:51
all pieces of a bootenv must exist for the bootenv to be created

shane
2017-12-29 17:52
is this the same system you were playing with previously - or a newly built one ?

shane
2017-12-29 17:52
do you have the `drp-community-content` installed in this system - it's referring to templates that exist in that content pack

stanislav.fomin
2017-12-29 17:53
No, I am trying to reproducable idempotent install with ansible

stanislav.fomin
2017-12-29 17:53
I have not bootenv except "local" and "ignore"

shane
2017-12-29 17:54
can you plz do: `drpcli contents list | jq '.[].meta.Name'`

stanislav.fomin
2017-12-29 17:54
and I am trying to add new bootenv (and fully understand what every options means)

stanislav.fomin
2017-12-29 17:54
[vagrant@dr-provision digital-rebar]$ drpcli contents list | jq '.[].meta.Name' "BackingStore" "LocalStore" "DefaultStore" "BasicStore"

shane
2017-12-29 17:55
right - so you do not have `drp-community-content` installed - so you either have to manually recreate all of the pieces that the new BootEnv you are adding rely on ...

shane
2017-12-29 17:55
or install `drp-community-content` first

shane
2017-12-29 17:55
if you want to create your own content pack - you need to take a look at the existing `drp-community-content` and recreate the pieces in it

shane
2017-12-29 17:56
we always validate on create and destroy that you aren't going to create an unusable system

shane
2017-12-29 17:56
so on destroy operations - if the piece of content you are trying to delete is in use by another content element - we won't delete it (without force)

shane
2017-12-29 17:56
same for create - you must add in the pieces and parts in order that are required by a content element

shane
2017-12-29 17:57
alternatively - you create a YAML or JSON content pack that contains everything in a single spec file - then it'll all be added at the same time

shane
2017-12-29 17:57
this is what the `drp-community-content` pack does - and why I suggest you look at it for inspiration

shane
2017-12-29 17:57
now - lets step back a bit first - though

shane
2017-12-29 17:58
what is your ultimate goal ?

shane
2017-12-29 17:58
just to get a custom Kickstart in place ?

stanislav.fomin
2017-12-29 17:59
Hm. I will prefer not install "drp-community-content" ? my goal to have controlled, repetable dr-provision installation with bootnevs: "sledghammer" - need for facts gathering, and "centos-7" with my custom kickstart files.

shane
2017-12-29 18:00
sure - so basically you want to strip down the drp-community-content to JUST centos 7 and sledgehammer ?

shane
2017-12-29 18:00
the `drp-community-content` pack is not very bit - it's just JSON spec for the bootenvs - and they really don't take hardly any space in the system

stanislav.fomin
2017-12-29 18:01
I use "https://github.com/mrlesmithjr/ansible-digital-rebar" ansible role, it does not install drp-community-content

shane
2017-12-29 18:03
oh - nice - didn't know someone was putting an ansible role together for DRP ... :slightly_smiling_face:

stanislav.fomin
2017-12-29 18:03
So I prefer that all YAML/kickstart files, all this staff will on some my git repo, with and templated by ansible with jinja templates.

shane
2017-12-29 18:04
the ansible role can simply pull in the latest `drp-community-content` - which I'd highly suggest as there are a LOT of pieces of content included in it that you do not need to replicate

stanislav.fomin
2017-12-29 18:04
So I can parametrise/templatesed every file, every config.

shane
2017-12-29 18:05
we've done a lot of that - and if you are interested in adding more Params work to the content packs - we'd really like to work with you to have that included back in to the DRP community content

shane
2017-12-29 18:05
that only makes the product better for you and for everyone else too

shane
2017-12-29 18:06
I have a Branch with some changes to allow for dynamic Kickstart or Preseeds to be specified in the existing content, so you can replace the stock supplied ones, without doing all of this cloning operations

stanislav.fomin
2017-12-29 18:06
BTW ? command line API of drpcli with "exists", "create", "update" ? not very convenient with regular idempotent ansible install. Will be better if "update" can create new object. Now I have do a lot of boring staff: * check if object exists * call update or create accordingly

shane
2017-12-29 18:06
via that change, you can specify a custom kickstart or pressed (depending on distro) to switch to instead of stock ones we provide

shane
2017-12-29 18:08
ah - but that pattern exists for a very very very good reason - it's fundamental to how our system operates - by creating a "Read Only" layer of content that can't (easily) be modified in the field - so your provisioning systems are not "drifting" in config from your ci/cd qa/dev whatever tested patterns have validated

shane
2017-12-29 18:09
the typical pattern is to develop your deployment content in your Dev/Test/QA/CI/CD lab/pipeline - and create a set of content that you deploy with your provisioner, lock the content read-only, and you know your field deployments content match your tested/validated content

stanislav.fomin
2017-12-29 18:10
OK, how I can pull "drp-community-content" pack? Try googling, but failled.

shane
2017-12-29 18:10
in this case - you'd create a Content Pack and deploy that content pack as one piece with the solution

shane
2017-12-29 18:10
a lot of your answers can be found in my `pkt-demo` BASH example, at: https://github.com/digitalrebar/provision/tree/master/examples/pkt-demo

shane
2017-12-29 18:11
specifically check out the `bin/control.sh` script - and the $CURL download patterns there

shane
2017-12-29 18:11
we also use a Catalog which will iterate the newest version of the content

shane
2017-12-29 18:11
so you can request the newest version of content in the Catalog, piece together your download URL from the Catalog

shane
2017-12-29 18:12
and get the most recent version of content for your system and hardware architecture (eg linux/mac/windows and arm/64bit, etc)


shane
2017-12-29 18:14
note in the Template - the `if` match statements need to be corrected - waiting on some other changes for those to be finished

shane
2017-12-29 18:17

stanislav.fomin
2017-12-29 18:19
Yes, I download OK. But I cannot find a line of code how to "install" all this "bundle"

shane
2017-12-29 18:20
`drpcli contents create - < drp-community-content.yaml`

shane
2017-12-29 18:20
if you haven't seen it yet, the `drpcli` binary has a built in help system

shane
2017-12-29 18:21
simply do: `drpcli <enter>` you get top level resources you can manipulate `drpcli contents <enter>` you get contents related operations ...etc...

stanislav.fomin
2017-12-29 18:21
Thank you. But no "drpcli contents" in this examples/// /home/stas/bred/provision/examples>ack "drpcli contents"

shane
2017-12-29 18:22
we also have Shell autocompletion - but you have to install it


shane
2017-12-29 18:23
follow the "for ubuntu" example for CentOS - I have an updated Doc version that hasn't hit the website yet that cleans up that section correctly

shane
2017-12-29 18:27
not sure what your "examples" is referring to ?

stanislav.fomin
2017-12-29 18:33
My "examples" ? folder "examples" from https://github.com/digitalrebar/provision.git

stanislav.fomin
2017-12-29 18:33
I tried "sudo /usr/local/bin/drpcli autocomplete /etc/bash_completion.d/drpcli" but still not worked... I will research it later

shane
2017-12-29 18:34
you have to "source" the autocomplete file after you create it

shane
2017-12-29 18:34
or log out and log back in

shane
2017-12-29 18:35
example: `source /etc/bash_completion.d/drpcli`

stanislav.fomin
2017-12-29 18:38
Of course I relogin... but fail...

shane
2017-12-29 18:38
does the /etc/bash_completion.d/drpcli exist/get created successfully ?

stanislav.fomin
2017-12-29 18:38
About params ? where I can specify default value for example for param "operating-system-disk"?

shane
2017-12-29 18:38
and check the permissions on it (644)

stanislav.fomin
2017-12-29 18:39
[vagrant@dr-provision ~]$ ls -l /etc/bash_completion.d/drpcli -rw-r--r-- 1 root root 155339 Dec 29 18:25 /etc/bash_completion.d/drpcli [vagrant@dr-provision ~]$

shane
2017-12-29 18:39
a param can be attached to a Machine - or a better pattern, is to create a Profile - and add the param and the value you want to that key in the Profile

shane
2017-12-29 18:39
then you attach the Profile to the machine

shane
2017-12-29 18:39
a Profile is a collection of Params that can all be applied to a Machine in one piece

stanislav.fomin
2017-12-29 18:40
Hmm. Is it possible to specify default value for param on global, "content file"-level?

shane
2017-12-29 18:40
you can attach multiple Profiles to any given Machine, and a Machine always includes the `global` profile by default

shane
2017-12-29 18:40
yes - you can add a Param to the `global` profile - but it will be applied to EVERY SINGLE machine that is provisioned

shane
2017-12-29 18:40
so if you have machines with different needs - this is not a good solution

stanislav.fomin
2017-12-29 18:41
(Current problem, that "centos-7-install" from community-content install all this staff on /dev/sde, not /dev/sda)

shane
2017-12-29 18:42
yes - so you want to set a Param in either a Profile, or added directly to a Machine - to specify `operating-system-disk` is `sde`

shane
2017-12-29 18:42
wait

shane
2017-12-29 18:42
are you saying it's getting installed to `sde`, but you WANT it installed to `sda` ?

shane
2017-12-29 18:43
we by default try to install to `sda` - but we also check to try and determine what the first disk is in the system

shane
2017-12-29 18:43
I'd have to ping @vlowther to ask him how that determination is made for certain

stanislav.fomin
2017-12-29 18:44
can I specify "global profile" in content file?

greg
2017-12-29 18:44
@stanislav.fomin @shane - the current `operating-system-disk` value is used. If it is not specified it will be `sda`

greg
2017-12-29 18:45
Parameters are processed as @shane describes: Machine -> Profiles on Machine -> Global Profile -> Parameter default.

shane
2017-12-29 18:45
no - `global` is a built-in profile - it exists without you needing to create it

greg
2017-12-29 18:47
Parameter objects can have schema and schema can specify a default. We don?t use that for content parameters now. We could. The Parameter default isn?t displayed in aggregated views.

stanislav.fomin
2017-12-29 19:08
OK, probably it is my problem with disks order in BIOS. But is it possible to specify "os disk" somehow ("lshw -class disk"): { "SSD" in "product" and size?"256GB" } ?

greg
2017-12-29 19:10
well - this is a feature that would be interesting.

greg
2017-12-29 19:11
You could build a task that runs during discovery that inventories the system and sets the `operating-system-disk` parameter on the system based upon that kind of command.

shane
2017-12-29 19:13
this is why we have the Workflow system - which relies on Sledgehammer

shane
2017-12-29 19:13
you're asking for advanced provisioning and workflow operations - things that require knowledge of the system being provisioned

shane
2017-12-29 19:13
which is why the sledgehammer solution was developed and why we use it

shane
2017-12-29 19:14
you can do very interesting things like this - by simply creating a Task to accomplish this during the Workflow

greg
2017-12-29 19:26
@lae or @ctrees - what is the VM management system y?all use? I think you mentioned it before.

greg
2017-12-29 19:35
nvm - proxmox is it, I think. :slightly_smiling_face:

ctrees
2017-12-29 19:35
I've used most all of them... VMWare (and all it's) Hyper-V, Virtual Box, Vagrant, ProxMox...

ctrees
2017-12-29 19:38
... OpenStack... I just blew up VMWare Fusion... so was kicking it off my main mac... but most the time when I'm feeding stuff to others to learn... Vagrant/VBox

ctrees
2017-12-29 19:38
I've learned to hate them all

greg
2017-12-29 19:38
ok cool - feel the hate.

greg
2017-12-29 19:38
:slightly_smiling_face:

ctrees
2017-12-29 19:39
BUT... the Doc at UNI pretty much sticks with ProxMox

greg
2017-12-29 19:58
okay - cool.

stanislav.fomin
2017-12-29 20:36
My problem with "centos-7-install" from community pack that it tries to use all disks: ? part swap --recommended part pv.6 --size=1 --grow {{if .ParamExists "operating-system-disk"}}--ondisk={{.Param "operating-system-disk"}}{{end}} ? What I want: put all OS stuff (root-swap-home) for /dev/sda, leave other disks, for example, for CEPH like that: clearpart --all --initlabel --drives=sda

stanislav.fomin
2017-12-29 20:36

shane
2017-12-29 20:38
agree - that the kickstart has fairly limited disk install setup - this is where the "kickseed" and a custom kickstart stuff would help

shane
2017-12-29 20:38
or a better parameratized setup for the kickstart config - to feed disk partitioning to the kickstart

shane
2017-12-29 20:40
it's possible to pluck out the "partitioning" component of the Kickstart, and create individual templates for different partitioning schemes

shane
2017-12-29 20:40
then you could use a parameter to define which disk partitioning scheme to use - and that would call the correct template

shane
2017-12-29 20:41
that allows you to create a generic kickstart that has custom parameterized elements

shane
2017-12-29 20:41
you could even create a "custom" partitioning template that was basically empty - and the correct partitioning snippet could be defined as Parameter input

shane
2017-12-29 20:42
but having a collection of several pre-defined partitioning schemes is a really nice piece to have

shane
2017-12-29 20:42
note that all of the content is based on Golang based text templating engine - so you can templatize and expand the results dynamically at run time

stanislav.fomin
2017-12-29 21:00
"Golang based text templating engine" ? unfortunately, they looks like ansible-jinja templating engine (with curly braces), so impossible to use "content.yaml" as ansible template.

vjcubas
2017-12-30 18:16
has joined #json

i.grischott
2017-12-31 22:18
Happy New Year @community :v: :wine_glass: Thanks to the DR-staff for your help and always fast response..

zehicle
2018-01-01 02:16
Same and thanks!

i.grischott
2018-01-01 21:00
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8M1FMZPF/image.png and commented: drpcli prefs set unknownBootEnv discovery defaultBootEnv sledgehammer defaultStage discover works normal..

shane
2018-01-01 21:02
@i.grischott yes there is....

shane
2018-01-01 21:03
@zehicle and @meshiest - I suspect that it's fallout from ui twiddling over the last week....

wdennis
2018-01-01 21:08
Happy New Year rebar-ites!

wdennis
2018-01-01 21:09
Maybe in 2018 we can think of a way to do automated UI testing?

i.grischott
2018-01-01 21:09
My testlab for testing cluster setup:

shane
2018-01-01 21:15
@wdennis - we'll gladly accept any checkins or code related to any and all testing efforts ...

wdennis
2018-01-01 21:23
@shane would it be hard in the UI for it to display the POST (etc) request that it is sending to the API server? (like in a 2-line bottom pane that could be opened/closed)

wdennis
2018-01-01 21:24
Might be a good way to audit what the UI is sending the API server (or is that all logged somewhere?)

i.grischott
2018-01-01 21:27
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8LSTDJUA/img_8644.jpg and commented: Testlab: 10 x DELL R610 everyone of this has 2x quadport 1Gb nic's an 1x dualport 10Gb i separated with 7x vlan's on 5 switch's there are different hdd sets on each server some with ssd's other with ssd and large hdds .. what do you think.. it's a good setup with vlan's? and it's possible to bond the nic's ? how can i do this.. how can i set different disk layout's on the machines? is there a best practice guide? ssd's for controller.. My idea is setup kubernetes on baremetal... and the next layer is openstack over there.. i think it's could be useful to install container linux from CoreOS for the base system instead of sledgehammer or centos, because container linux is very secure and small and has a good update strategy with a dual partition system.. what do you think about that.. there is a way to install ansible on it maybe with ActivePython.. https://vadosware.io/post/installing-python-on-coreos-with-ansible/ but i don't know about the security.. Sorry for my noob-questions.. the power is with you master yoda :slightly_smiling_face:

i.grischott
2018-01-01 21:28

wdennis
2018-01-01 21:29
@i.grischott I want your lab :slightly_smiling_face:

i.grischott
2018-01-01 21:33
:slightly_smiling_face:

i.grischott
2018-01-01 21:33
I want your knowledge

i.grischott
2018-01-01 21:34
ähh wisdom :wink:

2018-01-01 21:39
@wdennis commented on @i.grischott?s file https://rackn.slack.com/files/U7U02J6LX/F8LSTDJUA/img_8644.jpg: What are the ?OpenNet? things at the top? Modular patch-bays, or ??

vlowther
2018-01-01 21:42
@wdennis Speaking of auditing, the thing I have been working on over the holidays is nearing completion -- an in-memory log buffer with threaded per-request logging.

i.grischott
2018-01-01 21:43
yep. open-net it's a vendor of patch-panels.. and similar..

vlowther
2018-01-01 21:44
The log levels have changed from numbers to meaningful names, and the log levels set via dr-provision args or pref updates (via the UX) can be overridden on a per-request basis.

wdennis
2018-01-01 21:55
@vlowther sounds good.

vlowther
2018-01-01 21:56
and there will shortly be an audit log level :slightly_smiling_face:

wdennis
2018-01-01 21:56
I just wonder how to test / regression-test the UX elements?

vlowther
2018-01-01 21:56
mutters something about selenium and saucelabs

zehicle
2018-01-01 23:23
@wdennis the browser already records all that - just open the dev panel and look at the network traffic.

ctrees
2018-01-02 12:55
@wdennis on the UX testing... in previous life-times I used a lot of selenium.... I'm good with starting with that and probably some sort of cucumber top-level-document.... ideally work that in with read-the-docs...

ctrees
2018-01-02 12:58
The issue I've had in the past is 'words'... the 'wording' and 're-use' become pretty darn pivital... so most the time code base dies the death of no-useage

ctrees
2018-01-02 13:06
I'm good with coding up some selenium / cucumber I'll attempt something this morning...

ctrees
2018-01-02 14:58
So... I'm taking some guesses... I'm going to assume nodejs as the backend for cucumber that drives webdriver... (there is a golang version... but I figure this is really for UX so js is probably the choice for UX stuff... I'll probably fire up Shayne's 5min test with packet target...

ctrees
2018-01-02 15:02
Since DRP is so command line... I'm thinking the test should 'check the command line' and use that to go find in the UX... and sort of helps my mental map of the cli and reflective UX...

shane
2018-01-02 17:34
- our first meetup of 2018 starts in about 90 minutes ... hope to see you all there ... agenda: https://docs.google.com/document/d/1cQsuWdkHQU-uHh0S3N9RqgYhHGW8gi5tb4fbE98OWvc meetup page: https://www.meetup.com/digitalrebar/events/xmrktnyxcbdb/


i.grischott
2018-01-03 11:34
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8LR3R25N/image.png and commented: With TIP v3.5.0-tip-22-3c3877c4bb5389bc92215a1bfa5cba22e39e8076 the Save-Button works, but not on the Endpoint Managment.

i.grischott
2018-01-03 11:35
what is the default root password on centos-7-install ? where can i set them?


ctrees
2018-01-03 14:11
7.3. Default Template Identity These settings apply to TEMPLATES only not the API. The default password for the default o/s templates is RocketSkates The default user for the default ubuntu/debian templates is rocketskates

ctrees
2018-01-03 14:12
So I think... user: rocketskates pw: RocketSkates

i.grischott
2018-01-03 14:12
thanks..

ctrees
2018-01-03 14:13
and the UX 'I think' has some refresh sync issues (which may be related to what you are seeing in the Save)

i.grischott
2018-01-03 14:15
@i.grischott uploaded a file: https://rackn.slack.com/files/U7U02J6LX/F8MLYU5L5/image.png and commented: i try to deploy k8s.. but i don't know my next steps.. :tired_face:

ctrees
2018-01-03 14:15
I was attempting to clear out old endpoints and the save didn't remove the endpoints from the UX portal..

shane
2018-01-03 14:15
@i.grischott - can you please submit an issue, and tag it "provision-ux-bug" - at: https://github.com/digitalrebar/provision/issues

i.grischott
2018-01-03 14:15
ok i try..

shane
2018-01-03 14:16
@ctrees - same for you - ticket please, we have some additional UI help right now, so if you can submit soonest, the issue should be addressed quickly - before @meshiest goes back to University ... :slightly_smiling_face:

ctrees
2018-01-03 14:16
I'll attempt the same for save endpoint...

shane
2018-01-03 14:16
also - try for root user: rebar1

shane
2018-01-03 14:17
to change it - view the kickstart seed - there is a Param that overrides the default - along with the Param that specifies to enable allowing Root SSH access after install - we HIGHLY SUGGEST NOT doing that - but that's up to your local policy

shane
2018-01-03 14:17
our recommendation is to use an SSH key - and put that in place

shane
2018-01-03 14:18
to use SSH key - set the Param "access-key" with a List of public key halves to add

shane
2018-01-03 14:19
search Slack for `access-key` and `access-ssh-root-mode` both greg and I have posted some info here - I'll draft a proper Doc page on this

shane
2018-01-03 14:19
(today)

shane
2018-01-03 14:21
thx @i.grischott we got the issue submission

greg
2018-01-03 14:28
centos is root/RocketSkates

greg
2018-01-03 14:28
ubuntu is rocketskates/RocketSkates

greg
2018-01-03 14:28
The parameter: `provisioner-default-password-hash` and be used to set the user

greg
2018-01-03 14:29
the parameter: `provisioner-default-user` can be used to set the default user on ubuntu/debian

greg
2018-01-03 14:29
The parameter: `provisioner-default-uid` can be used to set the uid of the default user on ubuntu/debian

greg
2018-01-03 14:30
The password hash needs to be generated per google. :slightly_smiling_face:

shane
2018-01-03 14:44
```mkpasswd -m sha-512 'my_password' `mktemp -u XXXXXXXXXXXXXXXX` ```

i.grischott
2018-01-03 16:00
is there a user-guide or howto for the ux to setup a k8s cluster... i don't know which selections of actions i need in which order. ..

greg
2018-01-03 16:06
@i.grischott - My order is this.

greg
2018-01-03 16:06
1. create your k8s-cluster profile, e.g. my-k8s-cluster

greg
2018-01-03 16:07
2. add a parameter to that profile: `krib/cluster-profile` = `my-k8s-cluster`

greg
2018-01-03 16:14
3. Using workflow editor, add the following workflow to the `my-k8s-cluster` profile.

greg
2018-01-03 16:18
a. centos-7-install -> runner-service:Success b. runner-service -> finish-install:Stop c. finish-install -> docker-install:Success d. docker-install -> krib-install:Success e. krib-install-> complete:Success f. discover->sledgehammer-wait:Success

greg
2018-01-03 16:18
The last entry is to handle discovery if you reimage the servers.

greg
2018-01-03 16:19
4. Add the profile to all the machines you want in the cluster.

greg
2018-01-03 16:19
5. Change stage on all the machines to `centos-7-install`

greg
2018-01-03 16:19
6. Reboot all the machines in your cluster.

greg
2018-01-03 16:19
then wait for them to get to complete.

i.grischott
2018-01-03 16:28
thanks very much..

zehicle
2018-01-03 17:48
we have videos of this process - nothing really documented. if you make notes about your install, we'll try to get it into the docs (pull requests welcome of course too)

shane
2018-01-03 17:48
...I'm actually writing that documentation ... right ... now ...

ctrees
2018-01-03 18:15
what's the timeout of the auth token on the UX portal ?

ctrees
2018-01-03 18:16

zehicle
2018-01-03 18:18
@ctrees 60 minutes right now. we're working to get auto-renew working. hopefully today (by EOW latest)

zehicle
2018-01-03 18:20
the endpoint tokens are 8 hours by default and renew at 4 hours.

ctrees
2018-01-03 18:24
I'm just attempting to script the UX login... having some successful failures :wink: but I think I need to add 'Given I am authenticated' function... you were right, you don't have good UX hooks... but there is a way in react to 'fix' that.... I don't see any 'RackN DSL' in the class tags... (I'm sort of shocked as all you guys seem to be very object and word use careful).... BTW... how do you test cli and api ? I saw only the one call in the .travis.yml but also see the code coverage call...

ctrees
2018-01-03 18:26
.... sorry too many questions in one blurb... it can wait till I push a test to explain my questions...

greg
2018-01-03 19:01
the `tools/test.sh` in the drp directory runs all the golang unit tests.

greg
2018-01-03 19:01
This tests the cli, api, and internal components.

greg
2018-01-03 19:01
It does pretty well generally. it doesn?t test the UX.

ctrees
2018-01-03 19:22
thanks... since the UX is really just a reflection of the cli / api I bounce back and forth in a feature test... I about fired up a golang based function backend for the feature reg-ex... but the UX is so tied to js I went that way... BUT I'll dig into tools/test.sh and leverage what-ever is there for cli api

ctrees
2018-01-03 19:24
I need an excuse to learn go... this is the best excuse I've had all year :wink:

shane
2018-01-03 22:18
- I finished a first pass at the KRIB documentation - if anyone would like to take it for a spin and let me know if you run in to any issues or questions ... would aprreciate it ... http://provision.readthedocs.io/en/latest/doc/integrations/krib.html

zehicle
2018-01-04 00:32
@ctrees the org/endpoint update list should be working again.

ctrees
2018-01-04 15:21
So I've created a 'test user' and as I was doing both the sign up and sign in... I noticed the "client_id"... I take it that's tied to the session ? (aka it's the expire token that deals with backend on amazon stuff)...

ctrees
2018-01-04 15:24
...wait... I'm going to create an issue in github and just ping here...

ctrees
2018-01-04 15:44
@zehicle no happiness on a re-test https://github.com/digitalrebar/provision/issues/612

zehicle
2018-01-04 15:47
The front end change was in queue - just released it.

zehicle
2018-01-04 15:47
will take about 10 minutes

zehicle
2018-01-04 15:48
sadly the refresh token thing is fighting against AWS Cognito docs.

ctrees
2018-01-04 15:48
ok... I'll hit it later.. I just created a test user and getting scripts to work... so I should be able to regression Signup, Login and Logout (as that pretty generic).

zehicle
2018-01-04 15:49
awesome!

ctrees
2018-01-04 15:49
I'll dump details in issue 620 (just created)


ctrees
2018-01-04 17:54
@zehicle that re-direct client ID is session unique ?? correct ??


ctrees
2018-01-04 17:54
The client_id part

ctrees
2018-01-04 17:54
aka 'oath-ish'

ctrees
2018-01-04 17:56
I think I get the reason... just messing with my head (and scripts) sorting out URL tracking stuff...

ctrees
2018-01-04 17:58
and the hover animations... are they sort of 'required' for event triggering or just for 'looks'

zehicle
2018-01-04 19:08
@ctrees yeah - it's a redirect that returns the JWT session token. we don't get the passwords at all, just the session token. we register the specific URLs that are allowed to do a redirect.

zehicle
2018-01-04 19:09
a future benefit is that we can use other SSOs to auth

ctrees
2018-01-04 19:21
I follow... what I was stumbling on is I need to pick up the token THROUGH the redirect BEFORE I attempt login... aka no real static 'login' page as I need to git the token first...

ctrees
2018-01-04 19:23
sorry... get token :wink: . Just need to add that to Page Object Login Function... ( it's not immutable ;-)

zehicle
2018-01-04 19:23
the token gets stored in the session, so you can recover it

zehicle
2018-01-04 19:24
that way it survives browser refresh

ctrees
2018-01-04 19:32
yup... but I'm starting the test from 'state-less' so I'll need to pick up the token each time for the browser session. BTW... with all this, I'd suggest NOT doing 'auto refresh' just make sure the browser session js dirties the cache well before token expire AND set a modal in the browser session that forces a re-auth... that lets the users setup deal with re-auth (aka don't attempt to help keep anything alive, just make sure the user see they need to re-auth)... you've got a 're-auth' built-in button I see.. ( that or I'm confused about "auto-renew" )

zehicle
2018-01-04 19:33
Cognito is supposed to issue a 30 day refresh token so that we don't have to keep forcing a login. we're working on that right now

zehicle
2018-01-04 19:44
once we get that @ctrees it may be able to just work w/o login once you have the refresh token stored

ctrees
2018-01-04 22:11
catmini:testyourlogin msops$ npm test > test-your-login@0.0.1 test /Users/msops/Code/testyourlogin > wdio ------------------------------------------------------------------ [chrome #0-0] Session ID: 70b95f2434c4362c75f4012ac95a0922 [chrome #0-0] Spec: /Users/msops/Code/testyourlogin/test/login.spec.js [chrome #0-0] Running: chrome [chrome #0-0] [chrome #0-0] Login Page [chrome #0-0] ? should look nice [chrome #0-0] ? should let you login with valid credentials [chrome #0-0] [chrome #0-0] [chrome #0-0] 2 passing (16s) [chrome #0-0] catmini:testyourlogin msops$

greg
2018-01-04 22:11
nice

ctrees
2018-01-04 22:11
as ALWAYS... it's something simple that causes me days :wink:

ctrees
2018-01-04 22:12
that and remembering yet-another-language-i-had-forgotten


ctrees
2018-01-05 03:36

ctrees
2018-01-05 03:48
@zehicle was right... the react stuff does make finding elements (actually waiting for them to appear) abit harder, but there are ways around it and the webdriver community is active... I'm just glad they are moving off ruby ( npm sure helps )

ctrees
2018-01-05 03:53
I still have to figure out expected failure UX indicators and 'how to bubble problems up to the user'

2018-01-05 14:49
@rackneng ok back to this.... any info anywhere in this KRIB install?

2018-01-05 14:49
anyone... someone... bueller ...

shane
2018-01-05 14:49
complete documentation is done


2018-01-05 14:50
@rackneng i can do this on VMs right ?

2018-01-05 14:50
as a test bed?

2018-01-05 14:57
another thing id love to see rebar support is bare metal XENServer and Triton installs

shane
2018-01-05 14:58
we already support XENServer - you only need add appropriate packages to install it on top of an existing BootEnv - that's a minor Stage change to add the workflow

shane
2018-01-05 14:59
you can use VMs - but managing the power actions of VMs only natively works in DRP via VirtualBox - other hypervisors will work - but you may need to manage the Power (on/off/reboot) and PXE (next boot) options yourself

zehicle
2018-01-05 15:31
@outbackdingo can you give some more details about what type of support you are looking for? Is this installing XENserver? power mgmt of vms? detecting vms?

2018-01-05 16:49
@rackneng well basically, we have both XenServer and Triton nodes deployed, and OpenStack ..... so in a way im workin to integrate into a more reasonable way of deploying things KRIBs on bare metal is just another path.... and thats just for my infrastruture, i wount even begin to think about how my clients can utilie this like most server resellers provide

2018-01-05 16:49
and sorry some of my million mile keyboard keys seems to have gone on holiday again

2018-01-05 16:52
my vision was to use rebar to do provisioning for clients through a web based system like all other server resellers

2018-01-05 16:52
but dont see that as short term feasible

zehicle
2018-01-05 17:10
@outbackdingo are you PXE booting the VMs on those platforms? the KRIB process relies on the CLI/API, not on PXE. It would be possible to use it from VM images if your images started the CLI and registered the node. It's a reasonable use case that would require some tweaks to facilitate.

zehicle
2018-01-05 17:10
The running VMs with DRP is something we are trying to understand better and would happily have a 1x1 design discussion about.

zehicle
2018-01-05 17:11
(that goes for anyone in the community)

2018-01-05 17:16
well right now i use XOA to create a PXE vm let itt boot rebar sees it then provisions it then i use XOA to reboot it, then rebar gets it and installs X on it and done

2018-01-05 17:17
so thats been the process for me for VMs still using 2 tools... XOA and rebar......

2018-01-05 17:18
i dont care how it needs to be done now... i dont mind some pain.... however in the future i really think rebar could shine if it had a multi-tenent ui

2018-01-05 17:19
as for KRIBs itself... yes i would like to spin up a 5 VM system to kick the tires I already have Triton SDc deployed so id like to compare them

2018-01-05 17:19
and well SDC itself is problematic now since it also uses PXE to install new bare metal nodes... :) YAY

detiber
2018-01-05 19:30
has joined #json

zehicle
2018-01-05 20:12
welcome @detiber!

detiber
2018-01-05 20:14
@zehicle thanks! I'm looking forward to kicking the tires with dr-provision

shane
2018-01-05 20:15
@detiber let us know if you have any questions ... and welcome

zehicle
2018-01-05 20:20
when you get far enough - we've got 2 k8s strategies w/ dynamic Ansible for Kubespray and the KRIB runner for Kubeadm.

zehicle
2018-01-05 20:25
@detiber RE: non-amd64 hosts. we don't have default images or a sledgehammer for them. it IS possible to detect arch and provide the right boot if we had the images. (adding @carl who has interest in ARM)

thays
2018-01-05 21:24
has joined #json

spector
2018-01-05 22:00
Welcome @thays

thays
2018-01-05 22:06
@spector Thanks! Can't wait to get it fired up and start building. Best place to look for any existing Saltstack integrations?

spector
2018-01-05 22:07
I will let the experts give you any info they have. @shane is the most likely to know

shane
2018-01-05 22:08
@thays we don't have any existing saltstack integrations at the moment - it's something on my personal bucket list to get done

shane
2018-01-05 22:08
however - if you wanted to tackle it - you could look at the `Ansible Reference` content as an example

shane
2018-01-05 22:09
coupled with how we handle secrets management in the `Kubernetes` (formerly named `krib`) content

shane
2018-01-05 22:09
the `Kubernetes` stuff shows how we handle tokens (in Saltstack's case it'd be pub/priv keys)

thays
2018-01-05 22:10
Cool..thanks I will check it out.

wdennis
2018-01-06 02:55
@shane you around?

wdennis
2018-01-06 02:56
Trying to add my krib profile to a new node I want to add to the cluster, getting a UX fail...

shane
2018-01-06 02:56
Nope :wink:

wdennis
2018-01-06 02:56

wdennis
2018-01-06 02:57
What does "Profile (at 0) does not exist" mean?

wdennis
2018-01-06 02:59
The `k8s-cluster1` profile absolutely does exist...

wdennis
2018-01-06 03:18
The UX is doing strange things to my data... It's putting quotes around the IPMI strings

wdennis
2018-01-06 03:19
As seen in UX:```ipmi/address: "idrac-796MQW1" ipmi/password: ### obfuscated text ### ipmi/username: "root"```

wdennis
2018-01-06 03:19
Via CLI: ```"Params": { "ipmi/address": "\"idrac-796MQW1\"", "ipmi/password": "xxxxxx", "ipmi/username": "\"root\"" }```

wdennis
2018-01-06 03:21
Why is that happening?

wdennis
2018-01-06 03:21
It's screwing up IPMI (actions fail)

wdennis
2018-01-06 03:23
:frustrated:

shane
2018-01-06 03:39
looking back through UX changes to see if something might have triggered that

shane
2018-01-06 03:39
how did you create the profile ?

wdennis
2018-01-06 03:40
Thru the UX back a week ago; have successfully deployed 4 machines (a K8s cluster) with it...

shane
2018-01-06 03:40
you running v3.5.0 on endpoint - any updates to endpoint or content in the last week ?

wdennis
2018-01-06 03:41
yes, and no updates

wdennis
2018-01-06 03:44
My KRIB profile: ``` { "Available": true, "Description": "", "Errors": [], "Meta": { "color": "", "icon": "", "title": "" }, "Name": "k8s-cluster1", "Params": { "access-keys": { "root": "ssh-rsa xxxxxx will@Wills-MacBook-Air" }, "access-ssh-root-mode": "yes", "change-stage/map": { "docker-install": "krib-install:Success", "finish-install": "docker-install:Success", "krib-install": "complete:Success", "runner-service": "finish-install:Stop", "ssh-access": "runner-service:Success", "ubuntu-16.04-install": "ssh-access:Success" }, "krib/cluster-admin-conf": { "apiVersion": "v1", "clusters": [ { "cluster": { "certificate-authority-data": "xxxxxx", "server": "https://192.168.1.114:6443" }, "name": "kubernetes" } ], "contexts": [ { "context": { "cluster": "kubernetes", "user": "kubernetes-admin" }, "name": "kubernetes-admin@kubernetes" } ], "current-context": "kubernetes-admin@kubernetes", "kind": "Config", "preferences": {}, "users": [ { "name": "kubernetes-admin", "user": { "client-certificate-data": "xxxxxx", "client-key-data": "xxxxxx" } } ] }, "krib/cluster-join-command": "kubeadm join --token 7dff02.8b4d92135c936919 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:xxxxxx", "krib/cluster-master": "1bcd8472-6c20-47b3-b9ff-f32731905bf1", "krib/cluster-profile": "k8s-cluster1", "local-repo": false, "operating-system-disk": "sda", "provisioner-default-fullname": "xxxxxx", "provisioner-default-password-hash": "xxxxxx", "provisioner-default-user": "xxxxxx" }, "ReadOnly": false, "Validated": true } ```

wdennis
2018-01-06 03:44
(w/ obvious redactions)

wdennis
2018-01-06 03:48
Trying to IPMI powercycle the node, doing this: `$ drpcli -E https://192.168.1.148:8092 machines action 174c3987-22a4-43d4-9eb9-0247162e8628 powercycle`

wdennis
2018-01-06 03:48
Getting a response returned, but no powercycle happening...

wdennis
2018-01-06 03:50
This is the output from that command: ```{ "Command": "powercycle", "OptionalParams": [], "Provider": "ipmi", "RequiredParams": [ "ipmi/username", "ipmi/password", "ipmi/address" ] }``` Alsoe the return code is '0': ```AirDennis:~ will$ echo $? 0```

wdennis
2018-01-06 04:03
OK, confused now - looks like one of the powercycles did run; it did run thru the stage-map for this node, and is on krib-install now; however, it is trying to do the join, and I'm seeing many entries like this at the end of the (running) job log: ``` [discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Failed to connect to API Server "192.168.1.114:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "7dff02" is invalid for this cluster, can't connect [discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Failed to connect to API Server "192.168.1.114:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "7dff02" is invalid for this cluster, can't connect ```

shane
2018-01-06 04:03
hmm - that is backend auth stuff with the Amazon Cognito pieces, I believe

shane
2018-01-06 04:04
oh wait

shane
2018-01-06 04:04
how long has this cluster been sitting ?

wdennis
2018-01-06 04:04
It's a k8s thing

shane
2018-01-06 04:04
It looks like the token is invalid as it's probably expired

shane
2018-01-06 04:05
to join the cluster

wdennis
2018-01-06 04:05
It's been up and running since 12/25

wdennis
2018-01-06 04:05
Ah, I see...

shane
2018-01-06 04:05
is 192.168.1.114 your endpoint ? and you set the API port to 6443 ?

wdennis
2018-01-06 04:05
It's the k8s master node

shane
2018-01-06 04:05
got it

wdennis
2018-01-06 04:06
So it's the k8s join token that's expired

wdennis
2018-01-06 04:08
So the join token stored in my profile is now invalid, I'm guessing...

shane
2018-01-06 04:08
yes

wdennis
2018-01-06 04:08
any way to refresh that?

shane
2018-01-06 04:08
I don't think we do anything with re-authing the tokens on our side - so that's going to be a k8s related issues

shane
2018-01-06 04:09
kubeadm has it's rough edges

shane
2018-01-06 04:09
I'm at a friends birthday party - so need to drop off - we can pursue this a bit further tmw

wdennis
2018-01-06 04:09
OK, have fun & thanks

ctrees
2018-01-06 04:30
so @wdennis was that " stuff just when you pulled things up in the UI editor ?

ctrees
2018-01-06 04:33
... I'm slowly learning react and re-learning what I forgot with selenium / webdriver in an attempt to UI test (and preform mechanical rob/shane tricks)

wdennis
2018-01-06 04:43
No, it was in the actual data (see above for CLI output)

wdennis
2018-01-06 04:45
God bless you @ctrees for taking a stab at UX testing

ctrees
2018-01-06 04:45
yea, I saw that... though you inverted the header.... as the screen shot had the escapes...

wdennis
2018-01-06 04:49
I have almost bailed on using the UX so many times due to bugs creeping in from constant dev

wdennis
2018-01-06 04:50
But it?s too pretty :dancer::skin-tone-3::joy:

ctrees
2018-01-06 04:51
I got login working then I had to re-write ... and I've got motivation from work...

greg
2018-01-06 05:46
@wdennis - the kubeadm join token is valid for 24 hours by default, I think. So, if you can only grow your cluster for a day. It is lame. I think we could add a parameter to the master config to change that. It may be that a token can be reissued as well by running kubeadm. these would be good future enhancements.

ctrees
2018-01-06 05:49
The SSL certificate used to load resources from https://qww9e4paf1.execute-api.us-west-2.amazonaws.com will be distrusted in M70. Once distrusted, users will be prevented from loading these resources. See https://g.co/chrome/symantecpkicerts for more information.

ctrees
2018-01-06 05:50
that came through the chrome browser dev console... as a warning....

wdennis
2018-01-06 14:26
Need to know how to terminate a running job (the `krib-install` stage)

wdennis
2018-01-06 14:27
Since it'll never succeed in finishing...

ctrees
2018-01-06 14:29
krib keep the runner running.... can't you just put a stop or ?? into the que and the runner picks it up ?

wdennis
2018-01-06 14:29
I have set "Runnable" to false on the node, but the drpcli process is still running `kubeadm` on the node

wdennis
2018-01-06 14:30
@ctrees That's what I don't know...

ctrees
2018-01-06 14:30
I'm just guessing as you've been playing with it way more than I... it's all just 'in theory' in my head...

wdennis
2018-01-06 14:30
I could -TERM kubeadm on the node, but not sure that's the right way to do it...

ctrees
2018-01-06 14:32
so in the krib demo... did he leave the runner going after kubeadm was installed ?

ctrees
2018-01-06 14:33
at the time, I was thinking OH this avoids ALL touches other than sledgehammer, but I was not sure how or if the runner migrated off sledgehammer

ctrees
2018-01-06 14:38
in my mind... there is a drp process ON the node IN sledgehammer that KNOWS it's a particular NODE, then there is the drp process that IS and knows it IS an endpoint and has runner que info for the node

wdennis
2018-01-06 14:39
Yes, the runner (drpcli "service") continues to run on the nodes

wdennis
2018-01-06 14:39
The "runner" is not tied to SH

wdennis
2018-01-06 14:40
It's a service that runs on SH

wdennis
2018-01-06 14:40
But you can also install/keep it running on any other install

wdennis
2018-01-06 14:41
That's the difference (AFAIK) between the `complete` (keeps runner running) and `complete-nowait` (terminates runner) stages

ctrees
2018-01-06 14:42
ah...

wdennis
2018-01-06 14:42
The KRIB stage-map ends with the `complete` stage

ctrees
2018-01-06 14:42
but the endpoint is the 'que repo' for the runner and the runner can dynamically update ?? right ??

wdennis
2018-01-06 14:43
Here?s my new node?s process tree: ``` root@k8s-ingress:~# pstree systemd???accounts-daemon???{gdbus} ? ??{gmain} ??acpid ??2*[agetty] ??atd ??cron ??dbus-daemon ??dhclient ??dockerd???containerd???10*[{containerd}] ? ??10*[{dockerd}] ??drpcli???bash???kubeadm???10*[{kubeadm}] ? ??11*[{drpcli}] ??irqbalance ??2*[iscsid] ??lvmetad ??lxcfs???3*[{lxcfs}] ??mdadm ??polkitd???{gdbus} ? ??{gmain} ??rsyslogd???{in:imklog} ? ??{in:imuxsock} ? ??{rs:main Q:Reg} ??snapd???9*[{snapd}] ??sshd???sshd???bash???pstree ??systemd???(sd-pam) ??systemd-journal ??systemd-logind ??systemd-timesyn???{sd-resolve} ??systemd-udevd ```

wdennis
2018-01-06 14:43
See the `drpcli` process? That?s the ?runner?

wdennis
2018-01-06 14:45
Yes, AFAIK, you can put more jobs into the ?hopper? for the node on the DRP server-side, and the node?s runner will dutifully pick them up and start executing them

wdennis
2018-01-06 14:46
I need to tell the runner somehow to terminate the `kubeadm` process? It?s just looping endlessly with:

ctrees
2018-01-06 14:46
so back to your issue, you can't tell the state of the "hopper" on the node side, but you should be able to see the "hopper que" on the endpoint side...

wdennis
2018-01-06 14:46
```[discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Failed to connect to API Server "192.168.1.114:6443": there is no JWS signed token in the cluster-info ConfigMap. This token id "7dff02" is invalid for this cluster, can't connect```

wdennis
2018-01-06 14:47
```AirDennis:~ will$ drpcli jobs log 463014d6-60ef-43be-848c-406296037716 -E https://192.168.1.148:8092 | strings | grep Failed | wc -l 7858```

wdennis
2018-01-06 14:48
:joy: actually :cry:

ctrees
2018-01-06 14:52
well... it's less than the newly discovered prime number :wink:

wdennis
2018-01-06 14:53
LOL

wdennis
2018-01-06 14:54
At least yet...

ctrees
2018-01-06 14:54
Oh... krib vs kubespray ? opinion ? I was liking the idea of handing off to ansible

ctrees
2018-01-06 14:55
but I sort of see the krib motivation too...

wdennis
2018-01-06 14:55
From what I gather from @zehicle and what I've been reading about k8s so far, `kubeadm` is supposed to be the blessed k8s orchestration tool

wdennis
2018-01-06 14:56
It does not handle HA controllers yet, as `kubespray` does, but I don't need that for what I'm doing (PoC cluster)

ctrees
2018-01-06 14:57
yea... but spray can install kubeadm unless I'm not following... I was wondering if security (no ssh) is an issue

wdennis
2018-01-06 14:57
So I went with KRIB as it's a DRP one-pass solution

wdennis
2018-01-06 14:58
And the "new hotness" :stuck_out_tongue_winking_eye:

ctrees
2018-01-06 14:59
I get that... and 'look ma, no-ssh'

wdennis
2018-01-06 14:59
And faster, AFAIK - Ansible is slow many times

wdennis
2018-01-06 15:02
Oh well, here goes nothing...

ctrees
2018-01-06 15:02
trying the hard boot option ?

wdennis
2018-01-06 15:02
```root@k8s-ingress:~# ps auxww | grep kubeadm root 4046 0.0 0.3 107164 59104 ? Sl 03:37 0:14 kubeadm join --token 7dff02.8b4d92135c936919 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2 root 12163 0.0 0.0 14224 1092 pts/0 S+ 15:01 0:00 grep --color=auto kubeadm root@k8s-ingress:~# kill -TERM 4046 root@k8s-ingress:~# ps auxww | grep kubeadm root 12217 0.0 0.0 14224 932 pts/0 S+ 15:02 0:00 grep --color=auto kubeadm```

wdennis
2018-01-06 15:03
Now to see the job state...

wdennis
2018-01-06 15:06
Yup, "failed": ```AirDennis:~ will$ drpcli jobs show 463014d6-60ef-43be-848c-406296037716 -E https://192.168.1.148:8092 { "Archived": false, "Available": true, "Current": true, "EndTime": "2018-01-06T05:45:16.709533598-05:00", "Errors": [], "ExitState": "failed", "Machine": "174c3987-22a4-43d4-9eb9-0247162e8628", "Meta": {}, "Previous": "82f2c4cc-2df2-45ea-a33f-a4cc1a0769a5", "ReadOnly": false, "Stage": "krib-install", "StartTime": "2018-01-05T18:20:06.89678544-05:00", "State": "failed", "Task": "krib-install", "Uuid": "463014d6-60ef-43be-848c-406296037716", "Validated": true }```

ctrees
2018-01-06 15:08
looks as if the runner reported back in

ctrees
2018-01-06 15:09
do you know what "Previous" is ?

ctrees
2018-01-06 15:10
... I sure learn a lot when your in pain...

wdennis
2018-01-06 15:10
I think "the job before this one"

ctrees
2018-01-06 15:11
OH... that makes sense...

wdennis
2018-01-06 15:11
Oh good, glad I can help :stuck_out_tongue_winking_eye:

wdennis
2018-01-06 15:11
It is the way of OpenSource(tm)

ctrees
2018-01-06 15:12
... and people with disorders...

ctrees
2018-01-06 15:12
anyway... did the runner take a new job ?

wdennis
2018-01-06 15:13
Nope, on a fail it terminates the job queue

ctrees
2018-01-06 15:13
OH...

wdennis
2018-01-06 15:13
There was only one more stage, which was `complete` anyways

wdennis
2018-01-06 15:13
The runner service (drpcli) is still running on the node

wdennis
2018-01-06 15:14
So that's where `complete` would leave it anyhow

ctrees
2018-01-06 15:15
Oh... that was my question, can you give the runner a new que task ? or is the only way to 're-cycle' is a hard boot... and do you know if the node joined the cluster (I assume not... ? right ?)

wdennis
2018-01-06 15:16
a) Yes, it think I could give the runner new jobs (not sure how that would work on the DRP server-side)

ctrees
2018-01-06 15:16
aka... I like this problem as it answers a lot of 'process questions' in my head... mostly DR (disaster recovery)

wdennis
2018-01-06 15:17
b) The node did not join b/c kubeadm could not auth to the k8s API server with the expired token

ctrees
2018-01-06 15:17
to me... DRP -> Disaster Recovery Protocal :wink:

wdennis
2018-01-06 15:18
KRIB (per @greg?s message to me above) only gets a 24h join token, and once it expires, there's currently no way for the DRP system to renew it and store the new one in the profile

wdennis
2018-01-06 15:18
Sadly, I did not know that before I started...

ctrees
2018-01-06 15:19
... yea... expiring and auto rotating tokens is the new hottness ya know...

wdennis
2018-01-06 15:20
Although it did prep my node for k8s... So that's cool

wdennis
2018-01-06 15:20
token rotation = a good idea

ctrees
2018-01-06 15:21
but if the runner failed on that node, the only option to do anything is hard reboot ?? right ??

ctrees
2018-01-06 15:22
then I get krib more... as the only way in is drp and/or kubeadm

wdennis
2018-01-06 15:22
Not the runner that fails, but the job

ctrees
2018-01-06 15:23
Yup, got that (in your debug instance)...

wdennis
2018-01-06 15:24
I'm not sure if they set up a `systemd` service for drpcli where it will auto-start it on boot...

wdennis
2018-01-06 15:26
Yes, they do: ```root@k8s-ingress:~# systemctl list-unit-files --type=service | grep drpcli drpcli-init.service enabled drpcli.service enabled```

ctrees
2018-01-06 15:27
... that's what I THOUGHT was going on... but they do that in the install dynamically via SH

wdennis
2018-01-06 15:28
Yeah, I think in the post-install shell script

ctrees
2018-01-06 15:28
cool stuff and all because your ssl expired :wink:

wdennis
2018-01-06 15:28
?No pain, no gain? I guess

ctrees
2018-01-06 15:29
... the UX testing has now 'distracted' me (in a good way) for 2 weeks...

ctrees
2018-01-06 15:32
I got it taking pretty pictures AND comparing those pictures to base image... (visual diff)

wdennis
2018-01-06 15:34
I was thinking that the way to test the UX would be to programatically ?work? the UX and see what resulting REST command gets sent to the DRP server

wdennis
2018-01-06 15:35
But I?m not a web dev guy so what do I know?

ctrees
2018-01-06 15:36
yup... THAT's exactly what I want to do

ctrees
2018-01-06 15:37
that and have issue drpcli and see the results in UI

ctrees
2018-01-06 15:38
should be able to test the rest api the same way... end up with 5-6 language mappings

ctrees
2018-01-06 15:40
lang meaning: ddrpcli, drp-golang, drp-api, drp-css, drp-bdd, drp-feature

wdennis
2018-01-06 15:42
The real way would be to (IMHO) to TDD as a UX dev style

wdennis
2018-01-06 15:42
Write a test case, ensure it fails, devel the feature, ensure the test passes

ctrees
2018-01-06 15:43
they've got unit in golan and rest-api test I hope to use that as UX basis...

ctrees
2018-01-06 15:45
yea TDD, BDD all great ideas... but if you think about it... ansible is basically that for cli (and cli is what.... 40 years old)....

ctrees
2018-01-06 15:47
anyway... my 'GOAL' is to make a UX test script setup that you can use to file bugs and/or 'capture' demos... realistically the UI is really just good documentation for the cli :wink:


ctrees
2018-01-06 15:51
that was my first attempt.... have 3 others (playing with other tools / structures) most are investigations for 'real work' but using drp as proving grounds...

greg
2018-01-06 15:51
Okay jobs and stuff.

greg
2018-01-06 15:52
kubeadm hung in a loop causing the task to run forever.

greg
2018-01-06 15:52
This is a task design issue and kubeadm issue.

greg
2018-01-06 15:53
regardless, the runner doesn?t not have a watchdog control to kill or stop tasks. That is an interesting idea to think about.

ctrees
2018-01-06 15:53
is the expire token thing aws ? or ?

greg
2018-01-06 15:54
Regardless, when @wdennis killed the kubeadm, the runnner saw it as failure, mark the node as not runnable (runnable == false) and went to sleep waiting for runnable to become true.

wdennis
2018-01-06 15:54
Or (it seems) a admin way to direct the runner to terminate the current job and move on (not sure that?s always a good idea, but the admin can make that call I guess)

greg
2018-01-06 15:55
The job for that task in the machine?s task list was marked failed and the task list index was left where it is at.

greg
2018-01-06 15:55
The purpose for this is that remediation can occur and then machine marked runnabled and the task processing will start where it left off.

wdennis
2018-01-06 15:55
Yes, b/c I went into the node and manually killed the process that drpcli had spawned

wdennis
2018-01-06 15:56
I did (via UX) tell the machine to stop the runner (Runnable = false) but that did not seem to work

greg
2018-01-06 15:56
kubeadm is running the hard loop inside itself apparently. We don?t run the kubeadm calls in a loop.

wdennis
2018-01-06 15:57
Oh sure, I get that

greg
2018-01-06 15:57
That is only checked when drpcli gets a chance.

greg
2018-01-06 15:57
Remember, DRP doesn?t touch machines. Machines touch DRP.

wdennis
2018-01-06 15:57
I thought it may ?systemctl stop drpcli.service? or something

wdennis
2018-01-06 15:57
Pull, not push

greg
2018-01-06 15:57
Yes

wdennis
2018-01-06 15:58
But there should be a way to centrally control the runner on the machines, right?

greg
2018-01-06 15:58
yes and no.

wdennis
2018-01-06 15:59
?it depends? :stuck_out_tongue_winking_eye:

greg
2018-01-06 15:59
Yes - mark the machine runnable and it will eventually stop baring badness.

greg
2018-01-06 15:59
No - DRP has no way to force the stop if something hangs.

wdennis
2018-01-06 15:59
OK, not following that?

greg
2018-01-06 15:59
Well - who knew that kubeadm would loop forever on a bad token.

greg
2018-01-06 16:00
That is unknown badness.

wdennis
2018-01-06 16:00
?mark the machine runnable and it will eventually stop? seems backwords to me?

greg
2018-01-06 16:00
sorry - not runnable

ctrees
2018-01-06 16:00
DRPe - Endpoint, DRPn - Node (I though the runner is the same golang bin but knows it's on a node ? correct ?)

wdennis
2018-01-06 16:00
Phew :slightly_smiling_face:

wdennis
2018-01-06 16:00
Yes, got you - once the runner kicks of a child process, it?s hard to know what that process is actually doing?

ctrees
2018-01-06 16:01
... cool @wdennis file a kubeadm bug !

wdennis
2018-01-06 16:01
Except when you get a return code to the parent

greg
2018-01-06 16:03
@ctrees - DRPn is drpcli running `machines processjobs <uuid>`

greg
2018-01-06 16:04
Yes - @wdennis

wdennis
2018-01-06 16:06
What would happen if the runner service was stopped in the middle of it processing a job queue?

greg
2018-01-06 16:06
now, with some of the new websocket stuff, we can have the drpcli create event stream watching the job and the machine. If the machine becomes not runnable or the job gets marked stopped, we could kill the process and close out things. There are some perils with this because of timing, but it could be made to work and itsn?t a bad idea. Should note it as a feature to deal with.

greg
2018-01-06 16:06
It would leave the job running in DRP.

wdennis
2018-01-06 16:07
i.e. you have this going on: `drpcli???bash???kubeadm???10*[{kubeadm}]` and then you do a `systemctl stop drpcli.service` on the node

greg
2018-01-06 16:07
Once the runner was restarted and asked for job (assuming the machine was runnable), DRP would mark the ?open? job as failed, create new job instance of the current task, and try again.

wdennis
2018-01-06 16:07
Ah, so that wouldn?t work

greg
2018-01-06 16:08
well - it works fine for `node got powercycled accidentally` case. But in this case, no.

wdennis
2018-01-06 16:08
Yup, see that

greg
2018-01-06 16:09
give me a minute to look up something.

wdennis
2018-01-06 16:10
So there?s no way of administratively telling the DRP system ?hey, stop processing the current job, mark it as ?admin terminated?, and move ahead with the next job?

wdennis
2018-01-06 16:11
(would take coordination with the node?s `drpcli` to restart it or something)

greg
2018-01-06 16:13
well.

wdennis
2018-01-06 16:13
Again, not sure that?s a great idea, but if the admin knows the current job which for whatever reason is not proceeding as planned (like I saw from job log output), and that in their estimation could be safely skipped and move on proceeding with the next job in the queue, why shouldn?t they have the power to do that?

greg
2018-01-06 16:13
You could do this, but not as a single action from DRP UX or CLI.

greg
2018-01-06 16:14
1. mark machine not runnable.

greg
2018-01-06 16:14
2. kill hung task / restart runner

greg
2018-01-06 16:14
3. Set currentTask to currentTask + 1 on the machine (I think this still works).

greg
2018-01-06 16:14
4. make machine runnable.

greg
2018-01-06 16:15
That would do you ?administrative? stop and move on to the next task.

wdennis
2018-01-06 16:15
Couldn?t that be orchestrated?

greg
2018-01-06 16:15
#2 is not implemented today.

greg
2018-01-06 16:16
but yes eventually.

wdennis
2018-01-06 16:16
I did do 1); I did 1st half of 2)

greg
2018-01-06 16:16
Okay remediating the expired token in krib.

greg
2018-01-06 16:16
Do you have your k8s admin node running still?

wdennis
2018-01-06 16:16
Yup

greg
2018-01-06 16:17
Can you do this for me from there? `kubeadm token list`

wdennis
2018-01-06 16:18
yup hold on --

greg
2018-01-06 16:18
ok

shane
2018-01-06 16:18
seems to me - you should be able to regenerate the token on Master - and manually inject that back in to the Profile, and then machines could join after that was completed

shane
2018-01-06 16:19
(regenerate token on k8s master)

greg
2018-01-06 16:19
yep - that is what I?m walking @wdennis through. Just collecting the commands.

wdennis
2018-01-06 16:19
```root@testnode03:~# kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS root@testnode03:~#```

greg
2018-01-06 16:19
@wdennis - `kubeadm token create`

wdennis
2018-01-06 16:20
```root@testnode03:~# kubeadm token create 866bc1.51f5919fd2dfd63e root@testnode03:~#```

greg
2018-01-06 16:20
On the DRP endpoint,

greg
2018-01-06 16:20
you can do this:

greg
2018-01-06 16:20
`drpcli profiles show <k8s cluster profile>`

wdennis
2018-01-06 16:21
Yes?

greg
2018-01-06 16:21
I need to see the join command in the `krib/cluster-join-command`

wdennis
2018-01-06 16:21
```"krib/cluster-join-command": "kubeadm join --token 7dff02.8b4d92135c936919 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2",```

greg
2018-01-06 16:22
I believe you can do this now:

greg
2018-01-06 16:22
```drpcli profiles set <k8s-cluster-profile> param krib/cluster-join-command to "kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2"```

greg
2018-01-06 16:23
Not I replaced the old token (7dff02.8b4d92135c936919) with a new token (866bc1.51f5919fd2dfd63e)

greg
2018-01-06 16:23
You should hten be able to mark the stopped node as Runnable and it should install and join.

greg
2018-01-06 16:24
Also, can you run a `kubeadm token list` again. I want to see the reported info.

wdennis
2018-01-06 16:26
```root@testnode03:~# kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 866bc1.51f5919fd2dfd63e 23h 2018-01-07T16:19:56Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token root@testnode03:~#```

wdennis
2018-01-06 16:27
I do not need to update the `--discovery-token-ca-cert-hash` right?

shane
2018-01-06 16:27
no the CA Cert Hash should be the same

wdennis
2018-01-06 16:27
Cool

shane
2018-01-06 16:28
unless the Cert has expired, too ... :slightly_smiling_face:

wdennis
2018-01-06 16:28
LOL

shane
2018-01-06 16:28
but in future when you create a token you should be able to set a longer expiry time with `--token-ttl` (if I'm reading the docs right)

wdennis
2018-01-06 16:30
```AirDennis:~ will$ drpcli profiles set k8s-cluster1 param krib/cluster-join-command to "kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2" -E https://192.168.1.148:8092 "kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2"```

wdennis
2018-01-06 16:30
Let's give it a go then!

wdennis
2018-01-06 16:31
How to use drpcli to set the machine runnable?

wdennis
2018-01-06 16:31
(Sorry, not trusting the UX right now?)

wdennis
2018-01-06 16:35
n/m, got it

greg
2018-01-06 16:35
`drpcli machines update <uuid> '{ "Runnable": true }'`

wdennis
2018-01-06 16:35
```Running: kubeadm join --token 866bc1.51f5919fd2dfd63e 192.168.1.114:6443 --discovery-token-ca-cert-hash sha256:af6edbddbffb904491599a03362d8deebeee68ffa170d20daffa2c6a70acb9b2 [preflight] Running pre-flight checks. [WARNING FileExisting-crictl]: crictl not found in system path [discovery] Trying to connect to API Server "192.168.1.114:6443" [discovery] Created cluster-info discovery client, requesting info from "https://192.168.1.114:6443" [discovery] Requesting info from "https://192.168.1.114:6443" again to validate TLS against the pinned public key [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.1.114:6443" [discovery] Successfully established connection with API Server "192.168.1.114:6443" This node has joined the cluster: * Certificate signing request was sent to master and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the master to see this node join the cluster. Finished successfully Command exited with status 0 Action krib-install.sh.tmpl finished Task krib-install finished Updated job 9d7d2564-26b4-439e-a049-c5b959b6da32 to finished```

shane
2018-01-06 16:35
or: `drpcli machines set bff5513f-7f63-43c6-b744-5eefaa9716be param Runnable to true`

wdennis
2018-01-06 16:35
GOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAL!

wdennis
2018-01-06 16:36
```root@testnode03:~# kubectl get nodes NAME STATUS ROLES AGE VERSION k8s-ingress Ready <none> 1m v1.9.1 testnode01 Ready <none> 11d v1.9.0 testnode02 Ready <none> 11d v1.9.0 testnode03 Ready master 12d v1.9.0 testnode04 Ready <none> 11d v1.9.0```

wdennis
2018-01-06 16:37
Thanks fellas :slightly_smiling_face:

shane
2018-01-06 16:39
(scratch that method I posted - not right)

greg
2018-01-06 16:48
@wdennis - cool

wdennis
2018-01-06 16:57
Learned a lot, so all good... ?all?s well that ends.?

wdennis
2018-01-06 16:59
Would it/could it be a thing that krib-install checks for join token validity and re-gens & stores new if no longer valid?

greg
2018-01-06 17:02
Well - this would be an example of a maintence operation workflow.

greg
2018-01-06 17:02
A stage/task that would regen the token and update the profile from the k8s master. You would set the k8s-master machine to that stage, it would update the token and go back to wait.

wdennis
2018-01-06 17:07
I was thinking check token, and if invalid (missing) then regen & update

wdennis
2018-01-06 17:07
But yes, separate task I suppose

shane
2018-01-06 17:09
there are lots of OPS related ways to do this ... which isn't our business model ... certainly need to consider if/how we want to approach the Day 2 side of things - but ... for each Operations task we bake in to DRP ... the more people will want to bake in - then it becomes rigid in how it works - which is the problem with DRv2 - we specified too tightly how operations should be handled by how things were built

wdennis
2018-01-06 17:09
Put that in front of krib-install and ?Bob?s your uncle? (for at least my failure case)

shane
2018-01-06 17:09
KRIB is a demonstration workload - not our business model

shane
2018-01-06 17:10
similarly - you could ask us to fix HA control plane in kubeadm - so that KRIB can bake an HA cluster ... but we're not going to touch that with a 100 foot pole

wdennis
2018-01-06 17:11
Ok, not going that far...

shane
2018-01-06 17:11
my point is: it's a slippery slope to start going down ... each tiny iterative change "seems like a good idea" at the time ... :slightly_smiling_face:

wdennis
2018-01-06 17:12
And sometimes to be honest it does seem to me like DRP is a k8s deployment system...

shane
2018-01-06 17:12
I do agree that we might consider how to make cluster join with the token after expiry easier

wdennis
2018-01-06 17:12
I get where you guys are coming from...

shane
2018-01-06 17:12
but NOT for KRIB's sake - for the sake of a generic model for any cluster management tooling going forward - a pattern that makes sense for the larger ecosystem

shane
2018-01-06 17:13
we could simply change the KRIB content to generate a non-expiring token

shane
2018-01-06 17:13
viola ... job finished

wdennis
2018-01-06 17:13
There you go...

wdennis
2018-01-06 17:14
KRIB to me is an opinionated k8s cluster deployer

shane
2018-01-06 17:14
that is exactly the rub ... that I'm getting at

shane
2018-01-06 17:15
we don't want to be in the business of "opinionated installers"

shane
2018-01-06 17:15
our business is helping you get your installation path up and running

shane
2018-01-06 17:15
but realistically - we have to demonstrate how the system works - for others to understand and take it up

wdennis
2018-01-06 17:15
So for a guy like me just learning about k8s and wanting my own bare-metal cluster, I was trying to leverage your installer

shane
2018-01-06 17:15
and eventually extend it for their own operational models

wdennis
2018-01-06 17:16
Granted I?m ignorant at this point on kubeadm as well as many other k8s things...

wdennis
2018-01-06 17:18
So maybe more caveats from the RackN side (?just an example installer, not for production use? etc) may be warranted...

shane
2018-01-06 17:18
it is a good learning experience - for you - and for us - as we learn how to operate ... Operational things within DRP - we find areas to extend, fix, enhance, and make better

wdennis
2018-01-06 17:18
Agreed

shane
2018-01-06 17:18
that caveat exists for `kubeadm` itself ... :slightly_smiling_face:

wdennis
2018-01-06 17:18
Yeah, I?m learning that...

zehicle
2018-01-06 19:11
+1 "generic model for any cluster management tooling" < KRIB helps find patterns for immutable deploys that DRP should facilitate. I would love to see a collaborative approach where Kubeadm / Kubespray did things that leverages DRP node ready state and cluster/profile metadata to make the install & Day 2 easier

zehicle
2018-01-06 19:13
RE: Kubespray vs KRIB - it would be great to let Kubespray to setup the control plane and then use Kubeadm join to attach the nodes. That pattern would work for hybrid cloud managed control too.

shane
2018-01-06 20:10
Nice ... someone forgot to sign the CentOS Repo pkgs for kubernetes 1.9.1 ??? ```Package efde37cfcd34c8232daafb0337b8ba5fda70100ab6988fca71ba30ce929311dd-kubelet-1.9.1-0.x86_64.rpm is not signed```

greg
2018-01-06 22:05
Yeah - I noticed that too. I turned off the checking, but it is wrong.

ctrees
2018-01-06 22:14
since your on... I've got screen size check params... default check mobile (aka small screen)

ctrees
2018-01-06 22:15
what's the min screen width before you tell someone to get a real monitor :wink:

greg
2018-01-06 22:18
4k ultra HD

ctrees
2018-01-06 22:18
yea... cool... can you get rob to send me one for testing ?

greg
2018-01-06 22:19
Actually, I don?t if we?ve set one. There is known issues betwen @zehicle and I because he tests with the firefox debugger open on the right side, while I don?t have it open until I need it.

ctrees
2018-01-06 22:19
I'm sure they are on sale at Fry's

greg
2018-01-06 22:19
I?d have to get one first.

ctrees
2018-01-06 22:20
wait... that's a trick... you don't EVER need one open...

ctrees
2018-01-06 22:20
viewports: [{ width: 1024, height: 768 }],

greg
2018-01-06 22:20
My guess would be something like 800x600, but 1024 would be better.

ctrees
2018-01-06 22:20
just going with the one... it's easy for 'some-one-who-cares-more' to run more ...

greg
2018-01-06 22:21
I, in general, can operate the system without need the debugger window or cli.

ctrees
2018-01-06 22:21
for sure this thing will be albe to auto-gen screen-shot-UX steps :wink:

greg
2018-01-06 22:21
cool

ctrees
2018-01-06 22:25
sorry for the spew... but...

ctrees
2018-01-06 22:25
catmini:drpfeature msops$ yarn run test:po yarn run v1.3.2 $ yarn run wdio wdio.PageObjectTest.conf.js $ /Users/msops/Code/drpfeature/node_modules/.bin/wdio wdio.PageObjectTest.conf.js ------------------------------------------------------------------ [chrome #0-0] Session ID: ca8bb510bc5543bba0d6fe9a6a5fbf5e [chrome #0-0] Spec: /Users/msops/Code/drpfeature/src/pospecs/login.spec.js [chrome #0-0] Running: chrome [chrome #0-0] [chrome #0-0] drp-ux auth form [chrome #0-0] - should deny access with wrong creds [chrome #0-0] ? should allow access with correct creds [chrome #0-0] [chrome #0-0] [chrome #0-0] 1 passing (9s) [chrome #0-0] 1 pending [chrome #0-0] :sparkles: Done in 13.29s. catmini:drpfeature msops$ yarn run test:po yarn run v1.3.2 $ yarn run wdio wdio.PageObjectTest.conf.js $ /Users/msops/Code/drpfeature/node_modules/.bin/wdio wdio.PageObjectTest.conf.js ------------------------------------------------------------------ [chrome #0-0] Session ID: 6ef9a9164beef7c5670bb61db1c6dca5 [chrome #0-0] Spec: /Users/msops/Code/drpfeature/src/pospecs/login.spec.js [chrome #0-0] Running: chrome [chrome #0-0] [chrome #0-0] drp-ux auth form [chrome #0-0] - should deny access with wrong creds [chrome #0-0] 1) should allow access with correct creds [chrome #0-0] [chrome #0-0] [chrome #0-0] 1 pending (10s) [chrome #0-0] 1 failing [chrome #0-0] [chrome #0-0] 1) drp-ux auth form should allow access with correct creds: [chrome #0-0] visCheck: System Management Check Fail: expected false to equal true [chrome #0-0] AssertionError: visCheck: System Management Check Fail: expected false to equal true [chrome #0-0] at /Users/msops/Code/drpfeature/src/pageobjects/page.js:13:53 [chrome #0-0] at Array.forEach (<anonymous>) [chrome #0-0] at Page.visCheck (/Users/msops/Code/drpfeature/src/pageobjects/page.js:12:13) [chrome #0-0] at Context.<anonymous> (/Users/msops/Code/drpfeature/src/pospecs/login.spec.js:52:20) [chrome #0-0] at new Promise (<anonymous>) [chrome #0-0] at new F (/Users/msops/Code/drpfeature/node_modules/core-js/library/modules/_export.js:35:28) [chrome #0-0] error Command failed with exit code 1. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. error Command failed with exit code 1. info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command. catmini:drpfeature msops$

ctrees
2018-01-06 22:26

ctrees
2018-01-06 22:28
purple is the css animation drift :wink:... but because I turned sensitivity up to '11' (spinal tap reference)

ctrees
2018-01-07 03:19
so... if you PAY for the aws 'stuff' esp the login... didn't they provide testing hooks ?.... there is something funny differnet about that darn pile of amazoncognito caca... I can't get focus now (or did someone change something)

ctrees
2018-01-07 03:41
Oh... hidden elements named the same but only rendered react .... I think... we need to talk a css style guide or 'something'...

ctrees
2018-01-07 03:43
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8Q0ZTM2A/screen_shot_2018-01-06_at_9.41.51_pm.png and commented: ... hacking around 4 now by guessing at the element #...

mclamb
2018-01-08 02:25
has joined #json



shane
2018-01-08 02:36
@mclamb welcome

mclamb
2018-01-08 02:37
Thanks, @shane! Saw Rob's talk at KubeCon, been generally interested in replacing MaaS with DRP lately...

shane
2018-01-08 02:37
@mclamb here is some documentation on the KRIB stuff - including links to the videos: http://provision.readthedocs.io/en/latest/doc/integrations/krib.html

mclamb
2018-01-08 02:38
One question I had was -- Are there two different products? Digital ReBar Provision (which is part of) Digital Rebar?

shane
2018-01-08 02:38
Digital Rebar Provision (ver 3) - is "the next version of Digital Rebar ver 2"

shane
2018-01-08 02:39
they are very different products and DRP is meant to succeed "Digital Rebar ver 2" (which is now EOL)

mclamb
2018-01-08 02:39
OK.. there seems to be some wording on the site about how DRP can be used "standalone" or as part of Digital Rebar

shane
2018-01-08 02:39
we're working on cleaning up confusing documentation relating to that - can you point to where you are referring ?

mclamb
2018-01-08 02:39
yeah lemme find it

shane
2018-01-08 02:39
thanks !


mclamb
2018-01-08 02:40
top of the page

mclamb
2018-01-08 02:40
"It is designed to stand alone or operate as part of the Digital Rebar management system."

shane
2018-01-08 02:42
ah yes - that's left over - I'll slap a change in right now to clean that up - thx for pointing that out

mclamb
2018-01-08 02:44
Given DRP is the product to use, I understand that its focus is bare metal provisioning. But you do have IPMI proxies for http://Packet.net (and Virtualbox?)... I was wondering if there are other cloud providers for which you do something similar? Digital Ocean, GCP, etc.?

shane
2018-01-08 02:45
ok - that doc has been cleaned up - in a couple minutes the changes should push to the RTD page

shane
2018-01-08 02:45
yes and yes

mclamb
2018-01-08 02:45
I have to deploy to on-prem bare metal for operations, but trying to use cloud for dev/test/etc. It would be nice to use DRP across all and maybe even use the same Terraform code (for DRP)... I guess it is not a huge deal if I have terraform for DRP, then Terraform for Cloud VM providers

shane
2018-01-08 02:45
we support via Plugins - both http://packet.net and virtualbox IPMI like power commands

shane
2018-01-08 02:46
as well as bare metal via IPMI as implemented by a hardware Baseboard Management Controller (BMC - eg. iDRAC, iLO, etc)

shane
2018-01-08 02:49
for the http://packet.net environment - you can see a basic example of spinning up a DRP Endpoint (provisioning server) and provisioning _N_ number of packet machines against that DRP Endpoint, in the example I wrote: https://github.com/digitalrebar/provision/tree/master/examples/pkt-demo

mclamb
2018-01-08 02:49
Cool thx

mclamb
2018-01-08 02:50
You might also consider changing the provision repo description too :wink:

shane
2018-01-08 02:51
you can check out an older YouTube vid on Mac OSX quickstart - which includes VirtualBox info - I haven't reviewed this video yet, but I think it should still be relevant - since the VirtualBox plugin hasn't changed: https://www.youtube.com/watch?v=uUWU-4ObGIY

mclamb
2018-01-08 02:51
Ok will definitely check those out

mclamb
2018-01-08 02:51
thanks for the tips

shane
2018-01-08 02:52
(all references to the UI in that video are no longer valid - but the info is still good - the new UI is vastly improved and much better)

shane
2018-01-08 02:53
this description: _"The Provisioner for DigitalRebar as a Stand Alone Golang Utility"_

shane
2018-01-08 02:53
??

mclamb
2018-01-08 02:53
Yeah... similar to the RTD change you just made, it suggests that it's a part of a larger piece?

shane
2018-01-08 02:53
only because I think you're connecting the two dots ... but I agree it could be worded much better

shane
2018-01-08 02:54
:slightly_smiling_face:

mclamb
2018-01-08 02:54
Further down in the repo it says "DR Provision is a APLv2 simple Golang executable that provides a simple yet complete API-driven DHCP/PXE/TFTP provisioning system" which seems more precise! :slightly_smiling_face:

mclamb
2018-01-08 02:54
Hah, yeah sorry to be so pedantic, but all cleared up for me now

shane
2018-01-08 02:55
no worries - fresh eyes and an outside perspective is refreshing - we sorta "pass over" some things and don't realize they're still in need to tweaking

shane
2018-01-08 02:56
well - it looks like my Corporate Overlord (@greg) doesn't trust me with "Settings" of the _digitalrebar_ repo - so I can't make that change ...

mclamb
2018-01-08 02:56
One last question before I run -- during the bare metal provisioning phase (in discover stage I presume?) I can configure networking and disks (e.g. create LACP bonds, VLANs, RAID devices, etc.)?

shane
2018-01-08 02:57
yes - but we don't have "content" that does that currently - you'd have to write a Stage - which ultimately would just implement the configuration you desire for a given environment

shane
2018-01-08 02:58
typically this would be a simple Bash shell - but you can do this any number of ways ... it would be run in the Sledgehammer image and once you advance a Machine through the Workflow stages, one of the stages would be your specific configuration for networking

shane
2018-01-08 02:59
one example would be to setup your own Configuration Management tooling of your choice (eg Saltstack, Ansible, Puppet) etc. and then you could run appropriate CfgMgmt tooling to do that

mclamb
2018-01-08 02:59
How would that network config stage then get "injected" into /etc/network/interfaces (or the like) once the final OS is installed?

mclamb
2018-01-08 02:59
Ahh... OK yeah, could just do it all via Ansible

shane
2018-01-08 02:59
the KRIB workflow that @zehicle demonstrated is an example of Stages that advance the Machine to a given end-state - your network config would be inserted in that chain of stages to "do it's thing" for you

shane
2018-01-08 03:00
Sledgehammer (discovery image) is just a Live Boot linux distro - so it's running in-mem and does all the Machine prep - including implementing and handling all of the Workflow stages

mclamb
2018-01-08 03:01
ok will start playing around with it soon! thanks for the help

shane
2018-01-08 03:01
you bet, drop by if you need any help/pointers as you work through it

shane
2018-01-08 03:02
that KRIB doc I pointed you at I only just recently wrote - so any feedback on it is appreciated

greg
2018-01-08 03:03
@shane - your ?corporate overload? says give it a shot now.

shane
2018-01-08 03:04
woot! you'll probably regret this day ... just sayin'

greg
2018-01-08 03:05
Already do

greg
2018-01-08 03:05
:slightly_smiling_face:

shane
2018-01-08 03:06
:stuck_out_tongue_winking_eye:

mclamb
2018-01-08 04:29
@shane Here's another reference in the docs to "the larger Digital Rebar system": http://provision.readthedocs.io/en/latest/doc/arch/server.html - Section 4.1

shane
2018-01-08 04:34
fixed

pmorris
2018-01-08 16:29
has joined #json

zehicle
2018-01-08 16:42
Hello @pmorris! welcome

pmorris
2018-01-08 16:43
Hello @zehicle! Thanks :smile:

2018-01-08 16:44
uhghhhh

shane
2018-01-08 16:44
uhggggggghhhhhhh!

2018-01-08 16:45
LOL come on im allowed to ughhhhh im preppring to work through this complex looking KRIbs

2018-01-08 16:45
:)

2018-01-08 16:45
cuz when i yell... youll all go... ughhhh .... not again :)

shane
2018-01-08 16:47
it's not very complex - the videos that @zehicle did should help, in conjunction with the KRIB doc in our RTD site

shane
2018-01-08 16:47
I'd go so far as to say ... it's probably the absolute easiest Kubernetes install you'll find out there

joel
2018-01-08 16:48
has joined #json

shane
2018-01-08 16:49
@joel welcome

2018-01-08 16:54
@rackneng videos ???

shane
2018-01-08 16:55
see the KRIB documentation 2 links to videos: http://provision.readthedocs.io/en/latest/doc/integrations/krib.html

shane
2018-01-08 16:55
the first one is probably the better one (the KubeCon presentation is longer and a bit broader than just KRIB)

2018-01-08 17:03
ok so question... is there anyway to get rebar to start at boot ???

2018-01-08 17:03
i dont see any "init" scripts

shane
2018-01-08 17:04
if you install NOT in the "--isolated" mode - we drop the correct startup scripts in place

shane
2018-01-08 17:04
if you did an --isolated install - you can still add the init scripts - I'd just suggest curl'ing down the install.sh script, and pull the init scripts out of there: ```curl -s get.rebar.digital/stable -o /tmp/install.sh```

2018-01-08 17:06
no i mean i have to login as root and run ./dr-provision --static-ip=148.251.24.11 --base-root=/home/dingo/drp-data --local-content="" --default-content="" &

shane
2018-01-08 17:06
yes

2018-01-08 17:06
everytime i boot the vm

shane
2018-01-08 17:07
you did the install in `--isolated` mode (non-production mode)

2018-01-08 17:07
oh ?

2018-01-08 17:07
hell

shane
2018-01-08 17:07
isolated mode does not install start up scripts

2018-01-08 17:07
can i change it ?

shane
2018-01-08 17:07
production mode does

2018-01-08 17:08
kind of afraid to mess with it now that its working

shane
2018-01-08 17:08
you can do one of two things: 1) reinstall in production mode (remove the `--isolated` flag during install) - you'll have to move your content back in to place 2) just add start up scripts that point to your current install location

shane
2018-01-08 17:08
in isolated mode, everything is self-contained in your `/home/dingo/drp-data` directory

2018-01-08 17:08
yupp

shane
2018-01-08 17:09
you could simply move this to a root folder location (stop dr-provision first) - for example: ```pkill dr-provision cp -r /home/dingo/drp-data /srv/drp```

shane
2018-01-08 17:09
then add start up scripts to reference the install in your new location

shane
2018-01-08 17:11
the `install.sh` script I referenced above has all of the BASH syntax to add the init scripts - it depends on what Linux distro you're using; which you'll need

shane
2018-01-08 17:12
you can pull the init scripts from our github repo as well: https://github.com/digitalrebar/provision/tree/master/assets/startup

2018-01-08 17:36
uhhhohhh dr-provision2018/01/08 17:35:47.209450 dataTracker: Error loading data: Failed to load backing objects from cache: Unable to load machines: unexpected end of JSON input

2018-01-08 17:36
i broke sumthin

greg
2018-01-08 17:38
Did you update DRP to tip?

2018-01-08 17:38
nope same install as its been just rebooted it

greg
2018-01-08 17:39
change paths?

2018-01-08 17:39
nope

2018-01-08 17:39
guess ill blow it away and do it proper this time

2018-01-08 17:40
ughhh i have machines provisioned from it though

2018-01-08 17:40
crap

greg
2018-01-08 17:40
well - first step is to cd to database and look at machine files to make sure they are valid json

greg
2018-01-08 17:40
cd drp-data/digitalrebar/machines

greg
2018-01-08 17:41
(for isolated mode).

2018-01-08 17:41
k soon as i finished backing up the directory

2018-01-08 18:15
-rw-r--r-- 1 root root 0 Jan 7 07:12 8338b877-f417-485b-836e-d11feff9e860.json

2018-01-08 18:15
dat just aint right

shane
2018-01-08 18:17
what's not right about it?

shane
2018-01-08 18:17
that's the UUID of a Machine, with the .json extension - is the contents wrong ?

shane
2018-01-08 18:17
ah - zero size

2018-01-08 18:17
0 bytes ?

shane
2018-01-08 18:17
nope - it shouldn't be zero size

2018-01-08 18:17
prolly something i booted and shutoff

2018-01-08 18:18
since i have vms that are already built before rebar, but pxe anyway

2018-01-08 18:18
have to get them off pxe

2018-01-08 18:19
now.... back to trying to configure for KRIBs

dunger
2018-01-09 02:45
has joined #json

zehicle
2018-01-09 15:57
Hello @dunger ! good to see you

shane
2018-01-09 16:36
welcome @dunger

shane
2018-01-10 01:16
- last minute notice - but I'm presenting basics about Digital Rebar Provision on the Boise Idaho meetup group - anyone interested is welcome to join via Zoom conference, at: https://zoom.us/j/4084048118

shane
2018-01-10 01:16
starts at 5:30 pm PST

ctrees
2018-01-11 03:16
was attempting to look at the drp-api

ctrees
2018-01-11 03:17
catmini:drpfeature msops$ curl -X GET --header --insecure 'Accept: text/plain' 'https://192.168.1.200:8092/api/v3/isos' curl: (3) Port number ended with ' ' curl: (60) SSL certificate problem: unable to get local issuer certificate More details here: https://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. HTTPS-proxy has similar options --proxy-cacert and --proxy-insecure. catmini:drpfeature msops$

ctrees
2018-01-11 03:17
Docs say something about an Authorize button


ctrees
2018-01-11 03:19
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8S7B24FR/drp-api-swagger.png and commented: I sort of remember the Authorized button or put a token in somewhere... but forgot where how

greg
2018-01-11 03:27
@ctrees the `Accept` parts needs a -H

greg
2018-01-11 03:27
You also need to add -H ?Content-Type: application/json?

greg
2018-01-11 03:28
I do this: ```curl -k -u rocketskates:r0cketsk8ts -X POST https://127.0.0.1:8092/api/v3/machines/a2b8c3e8-d524-48d7-9ecd-83ad460836a2/actions/increment -H "Content-Type: application/json"```

greg
2018-01-11 03:29
The accepts is nice but not required.

zehicle
2018-01-11 04:59
You can also use the bearer token header.

ctrees
2018-01-11 14:41
So how are you testing the api currently ? Something in golang I take it


greg
2018-01-11 14:57
Yes

ctrees
2018-01-11 14:57
I see things like iso_test.go which looks like a test of the iso command... so I'm going to follow that pattern

ctrees
2018-01-11 14:57
aka I should be able to get the same data from cli, api, ux (and file system for that matter)

greg
2018-01-11 14:58
More in a second

ctrees
2018-01-11 15:00
half thinking about sticking in some BATS as bash seems to be your native go-to and it may help my brain wrap around the cli better while staying out of the golang

greg
2018-01-11 15:04
well - I used the cli for most things. ONly if I need to really make sure the raw json is working or the cli is busted do I revert to curl. The UX does raw calls and gets json blobs ?auto-morphed? into javascript objects. This is a little annoying, but passable.

greg
2018-01-11 15:04
There is one thing to realize about the golang tests that is really powerful, but tricky to non-golang systems.

greg
2018-01-11 15:05
The DRP server can be instantiated as an internal service inside other golang programs.

greg
2018-01-11 15:05
This is amazingly powerful for unit tests and integration tests.

ctrees
2018-01-11 15:06
yea I figured that nesting was also how the runner and that que stuff works

ctrees
2018-01-11 15:06
? right ?

greg
2018-01-11 15:06
The structures are better separated for that. The runner only needs to models, cli, and api directories.

greg
2018-01-11 15:07
But the code is shared.

greg
2018-01-11 15:08
Effectively, the unit tests run as two halves. The server side (models, backend, midlayer, and server directories mostly) and the api or cli side (models, api, and/or cli).

ctrees
2018-01-11 15:10
so does travis run the units then ? (as I can trace how they run from that eventually)

greg
2018-01-11 15:10
For example, in the terraform drp provider, I use this to test the terraform provider without having to spin up infrastructure. In the tests, I start a server, run terrraform tests against the internal server and validates api driving.

greg
2018-01-11 15:10
Yes.

greg
2018-01-11 15:10
There are scripts in the tree that travis calls.

greg
2018-01-11 15:11
On my mac, I have a golang build env at 1.9 setup.

greg
2018-01-11 15:11
I run: `ulimit -n 2560`

greg
2018-01-11 15:11
Then I can run:

greg
2018-01-11 15:11
`tools/build.sh`

greg
2018-01-11 15:11
This will build all the platforms and do swagger stuff. It will also attempt to add missing components like glide and swagger.

greg
2018-01-11 15:12
I think. Travis does this as well.

greg
2018-01-11 15:12
Then when this is done, I can run: `tools/test.sh`

greg
2018-01-11 15:12
This runs all the unit tests in all the directories.

greg
2018-01-11 15:12
It can take 5-10 minutes.

greg
2018-01-11 15:13
This runs go test with atomic verification and profiling.

greg
2018-01-11 15:13
You can also cd into a directory and run go test without all that.

ctrees
2018-01-11 15:14
yea... I started to follow the test.sh to look for tests and data to basically replicate out to ux

greg
2018-01-11 15:14
The main thing we don?t test really well is the DHCP server.

greg
2018-01-11 15:14
Most of the tests build their own data. We start empty except for the ?constants? (local stage, none stage, local bootenv, ignore bootenv).

greg
2018-01-11 15:14
global profile

greg
2018-01-11 15:16
@vlowther recently changed the style of the unit tests. They are more file based than internal string based. This is good. So in the cli directory, there is a data directory that contains expected files.

greg
2018-01-11 15:16
Run you run the tests, the output files get built and diffed.

greg
2018-01-11 15:16
There is a script a dev can run to ?fix-up? the expected files.

vlowther
2018-01-11 15:16
yeah, fixInteractive.sh

greg
2018-01-11 15:17
This is really useful when I change the cli usage text and have to change all the files. :slightly_smiling_face:

ctrees
2018-01-11 15:17
Oh... good...

greg
2018-01-11 15:18
The implication to that is we can catch very minor changes and decide if we are okay with them floating out.

greg
2018-01-11 15:18
On the API and CLI side.

ctrees
2018-01-11 15:19
that's sort of what I was looking for... is use the test data generation / checking for stuff you guys have done as population for data checks in the UX (eventually)

ctrees
2018-01-11 15:20
What I decided (last night) is I should KISS for now and replicate a simple demo first and see if Shane/Rob/Issac are willing to use... so I switched back to using data in quickstart

greg
2018-01-11 15:20
@ctrees - you are amazing and gutsy.

vlowther
2018-01-11 15:20
arrgh -- I was going to point you at an example, but Github is giving me the angry pink unicorn!

greg
2018-01-11 15:21
So - a couple of things.

vlowther
2018-01-11 15:21

ctrees
2018-01-11 15:22
no... I've been through this before and have a fascination with toliets and all UX is just 'shinny object' to distract people for the 'shit' they create :wink:

vlowther
2018-01-11 15:22
hah

ctrees
2018-01-11 15:22
woops... this is public I should retract that statement...

ctrees
2018-01-11 15:22
'flush'

greg
2018-01-11 15:23
I can see three issues with this. 1. you need a server to test against. Drp is lightweight, it can just run in most locations, so it is probably fine, but have to manage start and stop. 2. data to manipulate - some prestage content packs should be good for that. 3. UX ids to aid in manupulating content.

greg
2018-01-11 15:24
@ctrees - I view it the other way around. crappy UX until you can get the glory of pure API underneath. :slightly_smiling_face:

ctrees
2018-01-11 15:25
yup... #3 is why I need Rob/Issac (I think right) or someone who LIKES shinny ( was thinking Shane ) to leverage more of the Experience of UX testing...

ctrees
2018-01-11 15:27
I'm right at that point now... which is why I keep saying 'RackN-DSL'... cause it's marketing buz I think Rob can leverage but it'll help in tie'n CSS to the React 'auto-dynamic-gen'

greg
2018-01-11 15:28
So - I?ve let Rob and Isaac run with the UX, but have been trying to get testing off and on. We have limited resources and I have limited knowledge around the testing side.

greg
2018-01-11 15:29
What I want/need is to understand what is needed to make the testing easier to write and effective.

ctrees
2018-01-11 15:30
Oh.. the UX RackN-CSS-DSL will help...

greg
2018-01-11 15:30
I can easily solved items #1 and #2. What I need to understand ( and full disclosure: haven?t looked at your tree yet (plugin rerwite is eating my time)) so don?t know what it is trying to do for sure and how. I think I know, but ?

ctrees
2018-01-11 15:31
humm... maybe you should just run the test I've got... from the structure you'll grok what I'm getting at 'I THINK'... cause it's the same patterns you and victor are doing low level...

greg
2018-01-11 15:31
okay - that is what I want to understand.

greg
2018-01-11 15:32
but angry unicorn says NO!

ctrees
2018-01-11 15:32
it's pretty simple... in the UX things like "RackN Portal Login" on the button

ctrees
2018-01-11 15:33
should be associated with css class "rackn-login-redirect" and React does some duplication, so you have to figure out how to tag it unique

ctrees
2018-01-11 15:33
sort of the same with 'generating dynamic test data' vs what Victor has done ?? I think ??

ctrees
2018-01-11 15:35
.... I got a simple login working... which deals with aws cognigo ? let me get the iso check feature test working and then if you and victor look at that... I think you'll both grok it pretty fast...

ctrees
2018-01-11 15:37
after that... hopefully the guys who CARE what the UX looks like (NOT ME, NOT YOU) for sure I'll ask wdennis :wink: cause he sort of put me on this path...

ctrees
2018-01-11 15:39
... it's a lot of work to keep the UX sane UNLESS you hook it to something like what I think vlowther did for golang unit

ctrees
2018-01-11 15:40
... what's angry unicorn ?

ctrees
2018-01-11 15:41
Oh

ctrees
2018-01-11 15:41

greg
2018-01-11 15:41
github was/is down.

greg
2018-01-11 15:41
okay - I?ll look at it when it comes back up.

ctrees
2018-01-11 15:43
you can wait till I check-in the iso list feature test... that'll be a better example for you... the login test only deals with the aws 'poo'

greg
2018-01-11 15:45
ok

zehicle
2018-01-11 15:52
@ctrees we're working to change the redirect into an API call from the UX - which may make it easier to test

ctrees
2018-01-11 15:53
naw I'm through that, that's the test that is working now

ctrees
2018-01-11 15:54
plus when you go with the proxy redirects like in the KRIB demo... that basically is the same thing (you have to pick up auth state dynamically somewhere)

ctrees
2018-01-11 15:58
if your messing with the UX CSS... I would LOVE to chat about that.. put a comment in Issue 627


ctrees
2018-01-11 16:01
The 'theme' pattern is a great place to put in a good 'RackN-DSL' css string pattern that would make feature -> pageobject -> rackn-dsl -> drpapi/drpcli data mapping automated sort of the same way swagger does for api generation

ctrees
2018-01-11 16:10
the biggest UX testing barrier is what you mentioned (that I was not aware of at the time) ReactJS generates the DOM dynamically AND it will generate multiple Identical Elements... which basically kills most common script UX testing selector techniques... I did figure a way around it BUT it'll make the test super fragile... there are React pattern techniques to prevent identical elements BUT you might as well fix the CSS templating and create rackn pageobjects as well...

ctrees
2018-01-11 16:14
anyway... I should have a UX test to verify iso out today... I'll post here when it's ready... it should be less than 20 min 'distraction' to try and I HOPE that'll show enough information to evaluate if this pattern is worth supporting

greg
2018-01-11 16:14
cool - i?ll try and learn more

ctrees
2018-01-11 16:14
... Oh... but the point of a good UX is to learn LESS :wink:

greg
2018-01-11 16:14
:slightly_smiling_face:

greg
2018-01-11 16:15
I don?t ever yet to learn less.

rcameron
2018-01-11 16:17
@rcameron has left the channel

wdennis
2018-01-11 20:25
Sometimes a good UX is a real timesaver and a lower-barrier entrypoint...

ctrees
2018-01-11 21:01
I agree... I just have to complain when I'm working ... BTW I to want you attempt to run some of this... it's pretty fragile right now as I figure out what React is doing in the DOM...

ctrees
2018-01-11 21:02
I've attempted to push test into the browser as common practice since '96'....

ctrees
2018-01-11 21:04
but this UX does look nice :pray:

wdennis
2018-01-11 21:04
@greg Trying to UEFI-boot a new Dell R640 w/ DRP, getting this:

2018-01-11 21:04
Time to feed the :bear:!


vlowther
2018-01-11 21:10
@wdennis What is option 67 set to for that subnet?

vlowther
2018-01-11 21:15
UEFI requires a different bootloader, you cannot just use pxelinux like we do by default.

wdennis
2018-01-11 21:45
Thx @vlowther... what is the correct param?

vlowther
2018-01-11 21:49
There are a couple, depending on how much you like ipxe

vlowther
2018-01-11 21:50
I have them saved as pinned messages.

vlowther
2018-01-11 21:51
{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}ipxe.pxe{{else}}ipxe.efi{{end}} <-- if you like ipxe {{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}} <-- if you don't like ipxe


ctrees
2018-01-11 21:55
at the bottom is a summary and screencast... I'll do another where I walk through the code steps for the iso check ux test...

greg
2018-01-11 22:06
Very nice! @ctrees

greg
2018-01-11 22:07
I like the video

ctrees
2018-01-11 22:07
see the click links for the ADD

greg
2018-01-11 22:08
??

ctrees
2018-01-11 22:09
above the video on the rtdocs page are tcXX links... they will jump to the location in the video of the description

greg
2018-01-11 22:09
I just watched the whole thing, but those are helpful too.

ctrees
2018-01-11 22:09
I use it so I don't have to review all the video to remember 'wtf'

greg
2018-01-11 22:09
Very cool

ctrees
2018-01-11 22:12
well... the question I'll have is after I do the code review screencast... when you watch that one I hope you and @vlowther will be able to figure out if we can tie your patterns up to the ui AND if the feature file format will be useful

ctrees
2018-01-11 22:13
I think you'll see the pattern that the css will need... then it's ALOT like a swagger def file (IMHO)

greg
2018-01-11 22:17
@ctrees - I need to play with and learn more.

greg
2018-01-11 22:17
I then need to beat on people to use it.

greg
2018-01-11 22:17
I still don?t completely understand that last part with css.

ctrees
2018-01-11 22:18
ok... well wait for my next screencast... I'll go through the details... then you can pull and run yourself

greg
2018-01-11 22:18
I also I think I?m about to break you. We are fixing the login code paths to work with cognito. Or maybe little.

ctrees
2018-01-11 22:18
it took me 3 weeks to get my brain around what React was doing

greg
2018-01-11 22:18
I mean correctly.

greg
2018-01-11 22:18
That part isn?t changing.

meshiest
2018-01-11 22:18
the new login prevents the need to go to a separate page

greg
2018-01-11 22:18
:slightly_smiling_face:

ctrees
2018-01-11 22:19
OH... I KNOW it'll all break...

meshiest
2018-01-11 22:19
it's easier to automate than the last time

meshiest
2018-01-11 22:19
I can make it easier to create custom userpools for you too

meshiest
2018-01-11 22:19
so you can test on multiple types of credentials

greg
2018-01-11 22:20
well - that is an internal-ish thing for the SaaS side, but yeah.

ctrees
2018-01-11 22:20
thats why I don't want to put more effort into it unless the css thing gets hook in

greg
2018-01-11 22:21
@meshiest - we can. While I?m very appreciative of @ctrees for contributing and pushing things, but we should be making this more a part of our overall process. :slightly_smiling_face:

ctrees
2018-01-11 22:23
btw... I'm doing the abstraction for 'other reasons' and for sure is not worth the effort unless it's in a CI chain that you like...

ctrees
2018-01-12 00:09
crap... I recorder and pushed up BUT youtube downgraded the res.. or I foo-bar'd a control...

ctrees
2018-01-12 00:09
for what it's worth...


ctrees
2018-01-12 00:10
but it's hard to see what I'm talking about... and I am very hard to follow listening too...

ctrees
2018-01-12 00:11
and I see you (royal you aka RackN) changed the Portal...

ctrees
2018-01-12 00:15
AW... you sucked in that aws stuff into React ? (browser does not redirect now ?)

greg
2018-01-12 00:17
Yes , that is what @meshiest was referencing.

ctrees
2018-01-12 00:17
Yea... maybe a skype session ? or something would be quicker... cause your right, it's not worth the effort unless you've got a code coordination strategy... my demo is really just pushing the same UX code coord to the docs also...

ctrees
2018-01-12 00:19
yup got that ... and thanks @meshiest I got your message... the code section you pointed out is for another test subject... that section will not work as is..

ctrees
2018-01-12 00:22
again... I THINK all the coordination can be done via attributes in the elements that are then used by REACT during it's render...

ctrees
2018-01-12 00:52
This may help:


ctrees
2018-01-12 00:53
or something like:


greg
2018-01-12 02:30
cool links. We have some linting we do currently as part of travis.

ctrees
2018-01-12 03:23
well... just like the swagger issue, there is a TON of 'helpful' libs... but god forbid you have to support them :wink: (the fewer the better)

ctrees
2018-01-12 03:27
you got to use something to key il8n and css theme on... with those maps feature file should be easy to generate via script

zehicle
2018-01-12 03:28
@ctrees I was talking w/ @meshiest today about adding IDs to elements in the UX. It's not a problem - we need to know which elements need IDs ("all" is not a very helpful answer on the first pass) since React does not require them.

ctrees
2018-01-12 03:30
your right 'all' is stupid :wink: that's where the il8n stuff comes in... if you have to have a language map that hits basically everything the user needs to identify

ctrees
2018-01-12 03:31
I attempted to start at the api tree... as I figure... well heck... all views in the SPA pull the data from that api...

ctrees
2018-01-12 03:31
and you've got a LOT of that in the text already...

ctrees
2018-01-12 03:32
if you had to change that text for il8n then easiest to relate the translation to the api as the 'hard reference'

zehicle
2018-01-12 03:32
COMMUNITY NOTE > today we updated the RackN Auth system to be contained within the app (no redirect). That change _should_ automatically retain your login for 30 days instead of 60 minutes.

ctrees
2018-01-12 03:34
I didn't dig to see if there is some sort of il8n hooks ... from the DOM side I didn't notice anything

greg
2018-01-12 03:39
@ctrees we don?t have any of that in place in either system.

ctrees
2018-01-12 03:40
humm... maybe that's the place to start... a spreadsheet with all the GET calls to drp-cli list that the UI calls ... simple start to il8n too ?

ctrees
2018-01-12 03:42
just do KISS, one at a time, till a solid pattern emerges that you-all like ?

greg
2018-01-12 03:42
Well. The weird / powerful part is that it is mostly done as a single component with for most things. So it is one get with parameterizrd inputs. The inputs could be i18b

zehicle
2018-01-12 03:46
@ctrees I don't understand why you want the API calls for UX testing

ctrees
2018-01-12 03:46
SO it really is just a big json blob... lets start with that... heck... I want that spreadsheet anyway... and that blob is probably where to start... just associate data with ui-id elements as in an il8n effort ?

ctrees
2018-01-12 03:46
the UI is just to show data from the CALLS ?? correct ??

zehicle
2018-01-12 03:48
the UI is getting API data, yes. Each screen will call multiple APIs to build the info

zehicle
2018-01-12 03:48
some more than others. PLUS there are two APIs - DRP and the SaaS

ctrees
2018-01-12 03:51
would you agree that UX testing is about making sure the Human has an Expected Experience ? aka isn't the API data the 'source of truth' for the UI ?

ctrees
2018-01-12 03:57
anyway... if I'm not making sense... give me a skype or a zoom

zehicle
2018-01-12 03:58
I see. makes sense. Was more thinking about render and logic issues, not data

zehicle
2018-01-12 04:00
technically, the dev > network display shows all the API calls the UX makes. we use that all the time in troubleshooting

wdennis
2018-01-12 04:28
@zehicle Is "dev > network" in the browser dev tools, or somewhere in the RackN Portal?

wdennis
2018-01-12 04:29
And, do you do cross-browser testing? (There's more than just Chrome out there...)

meshiest
2018-01-12 04:29
@wdennis dev network is the network view of dev tools

wdennis
2018-01-12 04:30
Where is it? (not a web dev, but would like to see REST calls the UX is making [or not] at times)

meshiest
2018-01-12 04:30
Right clicking on a page and inspecting element should allow you to open the dev tools

wdennis
2018-01-12 04:30
OK

meshiest
2018-01-12 04:30
There should be a tab with network on most modern browsers including edge

wdennis
2018-01-12 04:31
Aha, <ctrl>-click on a Mac :slightly_smiling_face:

wdennis
2018-01-12 04:35
Does not work on Safari v11.0.2 (12604.4.7.1.6)

wdennis
2018-01-12 04:36
Does work on Chrome (v63.0.3239.132)

pierre.romagne
2018-01-12 14:02
has joined #json

romain.lafontaine
2018-01-12 15:27
has joined #json

zehicle
2018-01-12 15:29
welcome @pierre.romagne

romain.lafontaine
2018-01-12 15:34
Hello there

shane
2018-01-12 15:50
@pierre.romagne and @romain.lafontaine - welcome

pierre.romagne
2018-01-12 15:57
\o - hey guys

chermack
2018-01-12 16:51
There is an UbiSoft Slack channel

shane
2018-01-13 00:02
- we hope you'll join us next Tuesday at 11am PST for our 9th online meetup. See the meetup page for agenda and RSVP info: https://www.meetup.com/digitalrebar/events/xmrktnyxcbvb/

ctrees
2018-01-13 23:06
So... nothing is answering a dhcp request... and I'm attempting to debug why...

ctrees
2018-01-13 23:06
catmini:drpisolated msops$ sudo ./dr-provision --static-ip=192.168.88.9 --base-root=/Users/msops/Code/drpfeature/drpisolated/drp-data --local-content="" --default-content=""

ctrees
2018-01-13 23:07
catmini:drpfeature msops$ sudo route -n add -net 255.255.255.255 192.168.88.9

ctrees
2018-01-13 23:07
(the macosx route thing)

ctrees
2018-01-13 23:07
added subnet...

ctrees
2018-01-13 23:08
added subnet via the UX

ctrees
2018-01-13 23:08
saw the request vi dhcpdump

ctrees
2018-01-13 23:10

ctrees
2018-01-13 23:11
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8SGP4VH8/subnet_info_via_ux.txt and commented: Deleted message and replaced with snippet


shane
2018-01-14 18:01
@ctrees also make sure you add the `--static-ip=192.168.88.9` (assuming _88.9_ is your DRP instance IP) additionally - make __certain__ that the DHCP instance for your hypervisor (VirtualBox ?) is disabled on that subnet - and vbox often lies about the status - and you may need to completely restart vbox after disabling DHCP to make it actually disable (which might mean a reboot in vbox's case to be *certain*)

ctrees
2018-01-14 19:00
Thanks... all your assumptions are correct, but I have not rebooted (doing now)

ctrees
2018-01-14 19:02
I was experimenting with dhcp client (just letting another maclaptop boot in dhcp mode)... to see the traffic on the network (which I saw via wireshark)

ctrees
2018-01-14 19:03
the drp logs seemed to hand out an IP, but the maclaptop never 'used' it (aka used a self assigned IP)...

ctrees
2018-01-14 19:04
I did see BOOTP malform warning messages in wireshark (but not sure where they came from)

ctrees
2018-01-14 19:11
is the https error just the local redirect ? (login into portal with default user/pw)

ctrees
2018-01-14 19:11

ctrees
2018-01-14 19:13
THAT WORKS @shane THANKS!!

ctrees
2018-01-14 19:19
humm... to my 'suprise' that also fixed the maclaptop dhcp request also... I take it the vbox 'route' adjustments basically foo-bar's the dhcp server access (aka outbound packets) to the real network port too... somehow...

shane
2018-01-14 19:24
vbox == messy

ctrees
2018-01-14 20:39
macosx == messy

shane
2018-01-14 20:40
macosx + vbox == pulling_hair_out

zehicle
2018-01-15 00:46
seems like --static-ip solves most problems of DRP not answering network requests as expected. sorry I did not suggest it earlier when I saw the thread.

greg
2018-01-15 04:26
especially on a mac.

florent.wagener
2018-01-15 15:42
hi guys, I've set up a physical to test what I've done only in a virtual environment. So I am using a CentOS 7 on a Dell R620 as my drp server. As for now I am trying to do a basic discovery of a ProLiant DL360 Gen9 but I am facing an issue with it. I have set up a subnet as below: ``` { "ActiveEnd": "10.0.49.200", "ActiveLeaseTime": 60, "ActiveStart": "10.0.49.100", "Available": true, "Enabled": true, "Errors": [], "Meta": {}, "Name": "bond0", "NextServer": "10.0.49.254", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.255.0" }, { "Code": 3, "Value": "10.0.49.1" }, { "Code": 6, "Value": "10.0.0.9" }, { "Code": 15, "Value": "http://example.com" }, { "Code": 28, "Value": "10.0.49.255" }, { "Code": 67, "Value": "{{if (eq (index . 77) \"iPXE\") }}default.ipxe{{else if (eq (index . 93) \"0\")}}ipxe.pxe{{else}}ipxe.efi{{end}}" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "Proxy": false, "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "10.0.49.254/24", "Validated": true } ``` Unfortunately when loading sledgehammer, I got a kernel panic: ```VFS: Cannot open root device "live:/sledgehammer.iso" or unknown-block(0,0): error -19 Please append a correct "root=" boot option; here are the available partition: Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-693.2.2.el7.x86_64 #1``` followed by the Call Trace

2018-01-15 15:42
Time to feed the :bear:!

florent.wagener
2018-01-15 15:42
any idea what could cause that ?

florent.wagener
2018-01-15 15:44
If I use `{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}}` the system is just looping in elilo

greg
2018-01-15 15:45
What is the system set to boot with (legacy BIOS or uefi?)

florent.wagener
2018-01-15 15:45
@greg UEFI

greg
2018-01-15 15:46
okay - checking somethings

shane
2018-01-15 15:48
(too late, I saw that)

greg
2018-01-15 15:48
nothing obvious

vlowther
2018-01-15 15:49
will test with our local gear

florent.wagener
2018-01-15 15:49
I'm gonna try on a different server to see if it's not a driver issue or something...

shane
2018-01-15 15:50
what kind of NIC do you have in that server ?

florent.wagener
2018-01-15 15:52
So I I am using `HP Ethernet 10Gb 2-port 561FLR-T Adapter`

greg
2018-01-15 15:54
@florent.wagener - give us a second. We dont? test UEFI every day. We are reseting some stuff to get there. I may be a few minutes.

florent.wagener
2018-01-15 15:55
@greg thanks, no problem :slightly_smiling_face:

florent.wagener
2018-01-15 16:24
So I just tried on a Dell PowerEdge R620 without UEFI configured and it worked like a charm. However when switching to UEFI the boot failed: `PXE-E23 Client received TFTP error from server`

2018-01-15 16:24
Time to feed the :bear:!

greg
2018-01-15 16:36
Yeah - we are looking into . We think we broke something in the latest DHCP/TFTP changes for UEFI. Should have a fix in a bit.

florent.wagener
2018-01-15 16:46
I suppose I will have to update the solution ? If so that's cool, a new test for me to do :slightly_smiling_face:

greg
2018-01-15 16:47
We are finding issues with UEFI.. Once we get a fix, you should be able to update the DRP and maybe the default content, thought it is looking like DRP. Then both should work.

florent.wagener
2018-01-15 17:24
alright !

florent.wagener
2018-01-15 19:27
Looks like I killed the channel :slightly_smiling_face:

shane
2018-01-15 19:28
why! why! Why did you have to go and kill the channel !! ?? ( :sob: )

zehicle
2018-01-15 19:49
you'll have to say something scarier than UEFI... something like ITIL

zehicle
2018-01-15 19:49
or GDPR

florent.wagener
2018-01-15 19:57
agile !

viktor.ekmark
2018-01-15 20:47
has joined #json

vlowther
2018-01-15 20:59
@florent.wagener I duplicated your issue -- there seems to be some behavioural issues between how the kernel and initrds are being loaded with ipxe in EFI mode and ipxe in legacy PXE mode -- with the subnet configured to use ipxe both ways, I can boot a qemu VM running with seaBIOS into sleddgehammer just fine, where a qemu VM running tianocore crashes in the way you describe.

florent.wagener
2018-01-15 21:03
@vlowther Great, any fix in head ?

vlowther
2018-01-15 21:03
Not yet.

vlowther
2018-01-15 21:04
I have duplicated it (after stumbling across all sorts of nifty ways in which UEFI almost but not quite implements pxe sanely)

shane
2018-01-15 21:06
@viktor.ekmark welcome

marc.heckmann
2018-01-15 21:48
For the record, elilo din't work any better w/ UEFI, ipxe actually got further. elilo wouldn't even boot into the kernel: It complained about an error on line 6 of the template

greg
2018-01-15 21:50
we are finding that elilo just may not work at all anymore. @marc.heckmann

greg
2018-01-15 21:50
by we, I mean @vlowther

marc.heckmann
2018-01-15 21:51
I recall w/ our old cobbler based solution that the bootx64.efi from CentOS 6 worked, but not the one from CentOS 7

marc.heckmann
2018-01-15 21:51
I didn't dig any deeper as to why unfortunately

marc.heckmann
2018-01-15 21:52
Something like that anyway, it's been a while

vlowther
2018-01-15 21:52
so, the tl;dr for ipxe is that when booting via legacy BIOS, you do not have to pass an initrd= argument to the command line -- ipxe does that natively. For whatever reason, it does not do that aitomatically when booting via UEFI.

vlowther
2018-01-15 21:53
It is rather annoying.

marc.heckmann
2018-01-15 21:57
ok, thanks, good to know

vlowther
2018-01-15 22:02
I am looking to see if I can make elilo work

vlowther
2018-01-15 22:02
the issue with it is that it does not have a native IPAPPEND featire (like pxelinux), not can it fake it (like we can with ipxe)

vlowther
2018-01-15 22:03
We need that to know which nic we booted from, and therefore which nic to DHCP on to fetch the second-stage bootloader.

marc.heckmann
2018-01-15 22:05
ok, I'll have to look at exactly what we did w/ Cobbler, 'cause I'm pretty sure we used IPAPPEND on UEFI.

marc.heckmann
2018-01-15 22:17
So after investigation, it turns out that the `bootx64.efi` that we're using w/ Cobbler is actually Grub. It supports the `macappend 2` statement. Any reason why Grub couldn't be used w/ DRP?

vlowther
2018-01-15 22:29
grub1 or grub2?

vlowther
2018-01-15 22:30
It has been a few years since I have tried them for booting UEFI systems over the network.

vlowther
2018-01-15 22:33
With the arch we were using at the time, they were flakier than elilo

vlowther
2018-01-15 22:34
they tended to have issues relocating kernels and initrds when they got too big

vlowther
2018-01-15 22:34
where elilo did not.

vlowther
2018-01-15 22:43
Otherwise, there is nothing in principle preventing the use of grub2

vlowther
2018-01-15 22:43
it is just another binary to include and another set of templates to expand.

vlowther
2018-01-15 22:44
I am more tempted to just standardize on ipxe, through.

marc.heckmann
2018-01-15 22:44
`GNU GRUB 0.97` is what I'm seeing in the `strings` output

marc.heckmann
2018-01-15 22:46
Like I said, whatever was shipping w/ CentOS 7 wasn't really working for us. Not sure if that was GRUB 2 or not

vlowther
2018-01-15 22:46
ah, the old version of grub that is no longer maintained or supported by upstream.

marc.heckmann
2018-01-15 22:46
I guess that's it


vlowther
2018-01-15 22:47
I liked it, bu then grub2 came along which was much more modular. And brittle. And flaky.

marc.heckmann
2018-01-15 22:49
In any case, it's sounding like standardizing on ipxe is the better way to go

ctrees
2018-01-16 01:12
In: http://provision.readthedocs.io/en/latest/doc/integrations/krib.html#configure-with-the-ux There is: centos-7-install -> runner-service:Success But in my endpoint UI I can't find "runner-service" I loaded krib content

greg
2018-01-16 01:31
You need task-library

ctrees
2018-01-16 01:39
ok... thanks

shane
2018-01-16 02:10
@ctrees my bust - I forgot to document that piece :flogs_self:

ctrees
2018-01-16 02:11
np... cause without your doc I would have gotten this far...

ctrees
2018-01-16 02:12
but I did something... dr-provision2018/01/16 02:08:41.443797 [4555:23165]frontend [audit]: /home/travis/gopath/src/github.com/digitalrebar/provision/frontend/frontend.go:636 [4555:23165]Authenticated rocketskates http2: server: error reading preface from client 192.168.88.9:62336: read tcp 192.168.88.9:8092->192.168.88.9:62336: read: connection reset by peer

shane
2018-01-16 02:12
did you use `tip` version ?

ctrees
2018-01-16 02:13
not sure... checking

shane
2018-01-16 02:14
`drpcli info get`

shane
2018-01-16 02:14
or `dr-provision --version`

ctrees
2018-01-16 02:16
catmini:drpisolated msops$ ./dr-provision --version dr-provision2018/01/16 02:16:37.366528 Version: v3.5.0-tip-49-6aea7e647d6cb992e22a141ce1411a3b3af73095

shane
2018-01-16 02:16
yeah - you running tip

shane
2018-01-16 02:17
you're 49 commits ahead of v3.5.0 stable

shane
2018-01-16 02:17
back rev to 3.5.0 stable please

ctrees
2018-01-16 02:17
I was attempting to force too...

ctrees
2018-01-16 02:18
you don't happen to have the git checkout string in your head :wink:

shane
2018-01-16 02:20
```pkill dr-provision cd <wherever_your_install_path_is> curl -s get.rebar.digital/stable | bash -s -- install --force --isolated``` (assuming isolated mode, otherwise, drop that flag)

shane
2018-01-16 02:20
8d49a776c3d7b40d2af07a356e7b33d2e2b99ca2

shane
2018-01-16 02:21
hmm - that's 3.4.1 version - not sure why stable isn't pulling v3.5.0 - checking on that

ctrees
2018-01-16 02:22
I'll just switch to the 3.5.0 tag

shane
2018-01-16 02:22
3.4.1 should be stable for KRIB

ctrees
2018-01-16 02:25
oh... I forgot... I did the install NOT a clone... guess I'm doing it your way

shane
2018-01-16 02:26
ah - my fault w/ versions ... I had a stupid startup script launching /usr/local/bin/dr-provision instead of my local isolated install binary ... :smacks_head:

shane
2018-01-16 02:28
so - there you go: `8dd3ac9c62a2555d315e07f5a190f2230e3a7ca7`

wdennis
2018-01-16 02:53
Just dropping by to say that KRIB is great stuff... Having loads of fun learning k8s on metal!

ctrees
2018-01-16 02:56
what are the things you are loading on k8s to learn ? mainly curious ? I'm headed toward a blender render cluster thing right now....

wdennis
2018-01-16 03:02
Mostly just learning the base system... How to deploy apps, publish them, do ingress... how to troubleshoot the problems I encounter (create)

wdennis
2018-01-16 03:03

wdennis
2018-01-16 03:04
^^^ my architecture

wdennis
2018-01-16 03:05

wdennis
2018-01-16 03:08
Learned some stuff about KRIB too - like adding a node after 24h is a bit painful :wink: (join token expires)

ctrees
2018-01-16 03:14
yup... I read that :wink: and they have a new login method ...

ctrees
2018-01-16 03:15
so @shane I get : Control Workflow CANNOT ACCESS: Updated Version Required! Workflow allows users to define automatic transitions between Stages on a Profile basis. Workflows require use of Stages and Tasks. In the Workflow View of UI....

ctrees
2018-01-16 03:18
where I didn't get with tip... AND I think my error on tip may have been something stuck in the UI (as when I switched the old UI was 'stuck' the same way... ended up clearing browser.

greg
2018-01-16 03:20
log into the SaaS and see if that clears it up.

ctrees
2018-01-16 03:20
ok..

ctrees
2018-01-16 03:23
I was log in... but I loaded the task-library and krib "Content" and workflow is working... thanks

ctrees
2018-01-16 14:53
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8TL685AR/drp-krib-test.png and commented: Was attempting the krib demo... got nodes into sledgehammer-wait but when I put into k8s-cluster-install... they seemed to not make it past pxe ?? (when I rebooted the node)

ctrees
2018-01-16 14:59
From what I can tell.. no ks file is generated in drp-data/tftpboot/machines BUT that's what the machine was looking for... (I rebooted and took some snaps)

ctrees
2018-01-16 15:01
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8TLC5469/drp-krib-test-reboot.png and commented: Begin of reboot

ctrees
2018-01-16 15:02
@ctrees uploaded a file: https://rackn.slack.com/files/U62R1805P/F8TLCMSQ5/drp-krib-test-no-ks.png and commented: looking for ks

greg
2018-01-16 15:02
What bootenv is the machine in?

ctrees
2018-01-16 15:04

greg
2018-01-16 15:05
Are those stages and bootenvs available?

ctrees
2018-01-16 15:07
and I've got no job-logs... ... OH... it's IN the UI, but not in the folder :wink:

ctrees
2018-01-16 15:08
I was going to ask about that... same as templates... I don't see them in the digitalrebar/templates but they are in the UI... I take it they are held in the saas-content/*.yaml ?

greg
2018-01-16 15:09
yes.

greg
2018-01-16 15:09
they are read-only in memory content.

ctrees
2018-01-16 15:11
ahw... the other (aka tip) I was getting bin log in digitalrebar/templates

ctrees
2018-01-16 15:12
woops drp-data/job-logs

ctrees
2018-01-16 15:13
and json in drp-data/digital-rebar/jobs/*.json

greg
2018-01-16 15:13
that is where it should go

greg
2018-01-16 15:14
jobs are ?ephemeral?

greg
2018-01-16 15:14
so they get stored in the ?database?

ctrees
2018-01-16 15:14
ok... so know thing between 3.5.0 and tip... and I guess I need to 'download' all the stage / tasks so the ks will generate ?

greg
2018-01-16 15:15
you need to make sure that the content package is in.

ctrees
2018-01-16 15:16
catmini:drpfeature msops$ ls -alu drpisolated/drp-data/saas-content/ total 208 drwxr-xr-x+ 5 msops staff 160 Jan 16 09:16 . drwxr-xr-x+ 9 msops staff 288 Jan 16 09:16 .. -rw-r--r--+ 1 msops staff 71999 Jan 15 22:04 default.yaml -rw-r--r--+ 1 root staff 14259 Jan 15 21:20 krib-v1.4.0-0-23e369560a623da0e69a08b925c2815343c9d987.yaml -rw-r--r--+ 1 root staff 14161 Jan 15 21:20 task-library-v1.4.0-0-23e369560a623da0e69a08b925c2815343c9d987.yaml catmini:drpfeature msops$

ctrees
2018-01-16 15:32
so does the get 'formed' by drp-data/digitalrebar/machines/f5335...248.json ? (seems that's where the node got stuck) but I see no job generated

shane
2018-01-16 15:36
@ctrees - yes, all content is dynamically rendered on request by DRP - and served from a read-only in-memory layer - you won't see the created elements on the DRP endpoint's filesystem

shane
2018-01-16 15:37
any templates will be instantiated, template pieces filled in, and made available to the Machine at request time

2018-01-16 16:18
hrmmm

greg
2018-01-16 16:19
@bsdwatch?

2018-01-16 16:19
signed in witth wrong github account

2018-01-16 16:19
grrrrr

2018-01-16 16:21
no way back i guess

2018-01-16 16:27
there we go

2018-01-16 16:28
ok anyway.... fixed that.... now...

greg
2018-01-16 16:39
FYI - I?m going to start on a release for DRP. 3.6. With the completion of the UEFI boot fixes, i?m going to start the process. Tip will be pre-release shortly.

vlowther
2018-01-16 16:40
@ctrees Dynamic rendered templates can contain sensitive information, so they cannot be browsed. If you know exactly what the generated filename is, you can pull it directly with http or tftp.

vlowther
2018-01-16 16:41
It is totally not because I was too lazy to implement a full filesystem overlay with directory merging.

ctrees
2018-01-16 16:46
[GIN] 2018/01/16 - 10:41:48 | 200 | 10.291µs | 192.168.88.9 | OPTIONS /api/v3/machines/db1dcb0f-d0b6-4afb-9da9-e62b62a68e24 [GIN] 2018/01/16 - 10:41:48 | 422 | 2.666013ms | 192.168.88.9 | PATCH /api/v3/machines/db1dcb0f-d0b6-4afb-9da9-e62b62a68e24

ctrees
2018-01-16 16:47
something I'm doing in the UI ... I think... when I attempt run the krib

shane
2018-01-16 16:48

shane
2018-01-16 16:49
also - you have your DRP endpoint (192.168.88.9) set as your default GW - presumably you've enabled `ip_forwarding` through your endpoint ?

ctrees
2018-01-16 16:51
now that is starting to make sense... I used the auto-gen one this time...

ctrees
2018-01-16 16:59
so anaconda fails to fetch kickstart but I can browse and get it ?

shane
2018-01-16 17:35
@ctrees did you modify the bootenv and associated templates in any way ?

ctrees
2018-01-16 17:43
... I was attempting to follow that key paragraph in your doc...


ctrees
2018-01-16 17:44
then I go back to the rob video...

ctrees
2018-01-16 17:46
I'll push what I've attempted... give me a sec... (I think you may be right... I've got a local subnet route problem... probably the vm can't talk the same way to the api as the browser... I have bridge adaptor but who knows WTF vbox / mac is doing...)


shane
2018-01-16 18:59
our meetup is starting in a few minutes - the agenda is here: https://docs.google.com/document/d/1b72e1dIAJgsfvJbJUpBG9Jmhq6WC1f5SYAOlkcwT0KQ


wdennis
2018-01-16 19:00
@shane Zoom link?

shane
2018-01-16 19:00

shane
2018-01-16 19:01
it's always posted in the http://meetup.com posting

shane
2018-01-16 19:53

2018-01-16 19:56
thanks for the link

shane
2018-01-16 19:57
you betcha

shane
2018-01-16 19:58
any feedback appreciated - if you have questions please drop them here - @wdennis is now a PRO at KRIB - and @ctrees is getting there too :slightly_smiling_face:

wdennis
2018-01-16 20:00
We need to document that ?renew the join token in the profile? stuff (enabling add-on nodes well past initial cluster bringup)

vlowther
2018-01-16 20:07
victor@m4700:~/gocode/src/github.com/digitalrebar/provision/pacman (master) $ drpcli profiles get global param package-repositories [ { "installSource": true, "os": [ "centos-7" ], "tag": "centos-7-install", "url": "http://192.168.124.11:3002" }, { "installSource": true, "os": [ "sledgehammer/f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15" ], "tag": "sledgehammer", "url": "http://192.168.124.11:3001" } ] victor@m4700:~/gocode/src/github.com/digitalrebar/provision/pacman (master)

vlowther
2018-01-16 20:07
That is the package-repositories attrib I used for the no-local-repos demo

vlowther
2018-01-16 20:07
One thing I did not call out is that all the files I used for the PXE process were also in the remote repos

vlowther
2018-01-16 20:09
and that dr-provision transparently proxied all the required TFTP requests for the kernel and initrd to the remote repos sledgehammer and the centos-7 install were configured to use.

ctrees
2018-01-16 20:50

ctrees
2018-01-16 20:50
ssh-access: user1: ssh <user_1_key> user@krib user2: ssh <user_2_key> user@krib

shane
2018-01-16 20:50
that's a standard ssh key injection process - we've documented that in a video on youtube

shane
2018-01-16 20:51

ctrees
2018-01-16 20:51
I guess my question was... is that setup ASSUMED before the krib doc? (that may be what I was missing)

shane
2018-01-16 20:52
it's not really needed - as we can operate w/out the keys - we don't use SSH

shane
2018-01-16 20:52
but it's for your convenience to be able to log in to the Kube master if necessary

shane
2018-01-16 20:52
again - you don't need to SSH to kube master - you can pull the profile config from DRP - and use a remote `kubectl` tool to manage the cluster

shane
2018-01-16 20:53
this allows you to build a cluster w/ zero login needed for a more secure environment

ctrees
2018-01-16 20:53
well then Im sort of out of ideas why the node can't get to the ks

ctrees
2018-01-16 20:55
andaconda says it can't get the file (and hangs) but curl can.. (and browser)...

shane
2018-01-16 20:56
is your curl test a different system from the Machine you're trying to provision ?

shane
2018-01-16 20:56
it might be a Machine --> DRP Endpoint connection problem

ctrees
2018-01-16 20:56
I don't actually know cause I can't get into that machine :wink:

shane
2018-01-16 20:56
is this virtualbox, physical infra, etc ?

ctrees
2018-01-16 20:57
vbox... but I think first I'm going through the ssh stuff... I need to understand that anyway...

shane
2018-01-16 20:57
are you able to do just a simple c7 install to a VM in vbox using this network setup ?

ctrees
2018-01-16 20:58
yes... but I didn't know what users it setup... aka ssh as I see the task fly by

shane
2018-01-16 20:59
that's "documented" in the kickstart how it gets built up with user creds

shane
2018-01-16 20:59
the root user in c7 case should have pw of "RocketSkates" (I believe)

ctrees
2018-01-16 20:59
ok... thanks... I'll peek at that...

shane
2018-01-16 20:59
accessible from the "console" of the VM

ctrees
2018-01-16 21:00
(aka good idea I should have though of)

shane
2018-01-16 22:19
- anyone local to the SF Bay Area - I'll be presenting Digital Rebar on Thursday evening at the BayLISA meetup group. More details:


shane
2018-01-16 22:43
Also - here's the replay video from today's meetup - which was crammed full of goey ooye goodness ... Did Victor avoid the Demo Gods wrath by presenting on THREE topics - combining TWO pieces of brand spanking new functionality combined in to a ONE demo? You're going to have to watch to find out ... :slightly_smiling_face: https://youtu.be/dtCKxueGEic

ctrees
2018-01-16 23:00
I'm guessing I have a network problem... I've done most things (other than fire up wireshark and look at all the traffic)... but I'm pretty sure @shane was correct... my machine -> endpoint does not talk... yet it get through ipxe... AND (I think) sledgehammer... but fails on anaconda

ctrees
2018-01-16 23:00
dr-provision2018/01/16 22:56:11.917015 Found our lease for strat: MAC token 08:00:27:66:d8:66, will use it dr-provision2018/01/16 22:56:11.918219 Received option: OptionDHCPMessageType: 3 dr-provision2018/01/16 22:56:11.918250 Received option: OptionParameterRequestList: dr-provision2018/01/16 22:56:11.918272 Received option: OptionVendorClassIdentifier: anaconda-Linux 3.10.0-693.el7.x86_64 x86_64 dr-provision2018/01/16 22:56:11.918681 xid 0x4f5f8353: Request handing out: 192.168.88.10 to 08:00:27:66:d8:66 via 192.168.88.9

ctrees
2018-01-16 23:01
now I'm attempting to figure out what 'in networking' is different when anaconda fetches the kickstarter...

zehicle
2018-01-17 15:05
I've added some keywords to slack that will bring up links when we type them: FAQ, KRIB, meetup, and issue.


zehicle
2018-01-17 15:05
FAQ


zehicle
2018-01-17 15:10
and :cloudia:

zehicle
2018-01-17 15:17
please let us know if we need other quick links.

shane
2018-01-17 15:19
show me the quickstart

2018-01-17 15:19

2018-01-17 15:47
ok how do we set reservations for existin systems / vms





2018-01-17 15:52
ahhh the mac is the token... got it

shane
2018-01-17 16:10
@shane set the channel topic: For SlackBot command help, type: "slackbot help"

2018-01-17 16:10
Available Commands: FAQ, $FAQ, $faq, $KRIB, $krib, $meetup, $Meetup, $issue, $Issue, $issues, $Issues, $quickstart, $QuickStart

shane
2018-01-17 16:11
$quickstart

2018-01-17 16:11

2018-01-17 16:24
$KRIB

2018-01-17 16:24
pffftttt

2018-01-17 16:25
ignore that

shane
2018-01-17 16:25
Hmm - might be because you're not a Slack user and coming in through a sameroom

shane
2018-01-17 16:25
$KRIB


2018-01-17 16:27
i am a slack user jus no loged into it now

shane
2018-01-17 16:27
doesn't count unless you're logged in ... :slightly_smiling_face:

2018-01-17 16:28
LOL

kamp.scott
2018-01-17 16:47
$KRIB


kamp.scott
2018-01-17 16:47
hah

zehicle
2018-01-17 17:02
welcome @kamp.scott!

wdennis
2018-01-17 17:13
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F8UAC0HUJ/derpy.jpg and commented: We need a "D[e]RP" emoji...

kamp.scott
2018-01-17 18:06
@zehicle thanks

kamp.scott
2018-01-17 18:07
So KRIBs on 5 large vms?

kamp.scott
2018-01-17 18:07
Spun up via rebar

kamp.scott
2018-01-17 18:08
Curious also why debiab and ububtu vms installed via rebar have no console access

kamp.scott
2018-01-17 18:09
I think I might go redeploy a production rebar first

greg
2018-01-17 18:28
Because the console= kernel param is not correct for your VM environment. I would guess. You can set the parameter `kernel-console` to `console=ttyS1,115200` or whatever it needs to be for your environment.

greg
2018-01-17 18:28
That can be set in the global profile.

greg
2018-01-17 18:28
The above is what we use for http://packet.net (as an example).

greg
2018-01-17 18:37
@kamp.scott - sorry forgot to note you

kamp.scott
2018-01-17 18:39
@greg I'll try it thanks

florent.wagener
2018-01-17 18:49

vlowther
2018-01-17 18:55
Cool. I have further cleanups incoming -- the T320 I keep around is significantly pickier then qemu with tianocore.

florent.wagener
2018-01-17 19:05
@vlowther aaaah my bad, I didn't see that I was using Legacy BIOS :smile: Let me try again with UEFI !

vlowther
2018-01-17 19:11
ok -- if you cound capture a tcpdump of the DHCP traffic while that is happening and send that to me that would help.

vlowther
2018-01-17 19:11
or a dhcpdump, if you have that tool lying around handy. :slightly_smiling_face:

vlowther
2018-01-17 19:12
has been staring at dhcp packets for the last couple of days trying to bang all this stuff out.

florent.wagener
2018-01-17 19:13

florent.wagener
2018-01-17 19:13
let me check :slightly_smiling_face:

vlowther
2018-01-17 19:16
yeah, that looks familiar.

vlowther
2018-01-17 19:16
I will have a patch out shortly.

kamp.scott
2018-01-17 19:21
so curious now can i got from stanndalone to a production install without re-insallin ?

florent.wagener
2018-01-17 19:21
@florent.wagener uploaded a file: https://rackn.slack.com/files/U8FAN7PLK/F8UCYJV8S/tcpdump_uefi_boot_failure.txt and commented: @vlowther here's the tcpdump of 2 boot sequences.

shane
2018-01-17 19:43
@kamp.scott - not really a "supported" path for that. However, you should be able to do it with the following hackery ... 1. backup your current `drp-data` directory (eg `tar -czvf /root/drp-isolated-backup.tgz drp-data/`) 2. `pkill dr-provision` service 3. perform fresh install on same host, without the `--isolated` flag 4. follow the start up scripts setup - BUT do NOT start the `dr-provision` service at this point 5. copy the `drp-data/*` directories recursively to `/var/lib/dr-provision` (eg: `unalias cp; cp -ra drp-data/* /var/lib/dr-provision/`) 6. make sure you're start up scripts are in place for your production mode (eg: `/etc/systemd/system/dr-provision.service`) 7. start the new production version with `systemctl start dr-provision.service` 8. verify everything is running fine 9. delete the `drp-data` directory (suggest retaining the backup copy for later just in case) 10. YMMV ... buyer beware ... I didn't fully test this process ... don't run with scissors, sharp objects may poke you, etc...

kamp.scott
2018-01-17 19:45
@shane hrmmmm might just consider a reinstall....

shane
2018-01-17 19:46
this is a "reinstall" - just pulling over any content and configurations and machine data from previous provisioning activities

kamp.scott
2018-01-17 20:19
so have he debian / ubunu disk partiions issue been fixed in he newest version ?

kamp.scott
2018-01-17 20:20
sine in a vm hey are xvdX and not sdX

shane
2018-01-17 20:21
change operating system disk

shane
2018-01-17 20:22
did you try that ?

kamp.scott
2018-01-17 20:30
@greg yes tha works for VMs bu no fo bare metal

kamp.scott
2018-01-17 20:30
and yes some of my keys on the keyboard are on holiday

shane
2018-01-17 20:31
bare metal most likely has a different device name than a VM - can you please verify the device names in a bare metal install ?

kamp.scott
2018-01-17 20:32
@shane linnux is sda sdb sdc on baremetal

kamp.scott
2018-01-17 20:33
hence my issue if i use rebar for a VM it fails... unless globally is set to xvda

kamp.scott
2018-01-17 20:36
@shane so just .... curl -fsSL get.rebar.digital/stable | bash -s -- install

shane
2018-01-17 20:44
ah - so what you're probably doing is applying the "operating-system-disk" in the global profile - and trying to provision BOTH VMs and bare metal - that won't work

shane
2018-01-17 20:45
you need to apply the Param `operating-system-disk` to individual machines. One way to do that is create a Profile - say "virtual-machines" and add the Param to that Profile - make any customizations in that Profile - then add that profile to the VMs

shane
2018-01-17 20:45
similarly - create a Profile named "bare-metal" and add any customizations specific to the bare metal hosts in that Profile - now apply that profile to the bare metal Machines

shane
2018-01-17 20:45
(remember to remove the global Param `operating-system-disk` when doing it this way)

kamp.scott
2018-01-17 20:46
@shane you guys provision vms and baremetal don you ?

shane
2018-01-17 20:47
yes - but our focus is on bare metal - VMs are for testing in our world (they have their place) - but our core value is managing bare metal systems

shane
2018-01-17 20:47
as you've tested - VMs work too

kamp.scott
2018-01-17 20:47
so is mine however theres jus some hings i can do in docker in a vm

shane
2018-01-17 20:47
but you need to provide customizations to different classes of machine s- that's what Profiles and Params give you

shane
2018-01-17 20:48
you just have to apply them to the right thing to do the different things correctly

kamp.scott
2018-01-17 20:48
yeah... rebars startin to make me feel a bit stupid

kamp.scott
2018-01-17 20:48
have to et my head around it better

kamp.scott
2018-01-17 20:49
omg this$%R$^%^& keyboard

kamp.scott
2018-01-17 20:52
which is why i considered a full reinstall ive since realized if you shutdown rebar any vm launched with it looses its ip in time


kamp.scott
2018-01-17 21:07
@shane nope fail..... doesnt seem that works.... subnets iso machines workflow all missin

ctrees
2018-01-17 21:07
I was attempting to assign the discovered machine...

ctrees
2018-01-17 21:07
[drpops@drpe drpisolated]$ ./drpcli machines list | jq '.[].Uuid' "4678729e-5147-43f6-a569-93b7668b8a40" [drpops@drpe drpisolated]$ ./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install Error: ValidationError: machines/4678729e-5147-43f6-a569-93b7668b8a40: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install [drpops@drpe drpisolated]$ ./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install --force Error: ValidationError: machines/4678729e-5147-43f6-a569-93b7668b8a40: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install [drpops@drpe drpisolated]$

ctrees
2018-01-17 21:08
woops... should have used snippet... but I was attempting to force the new boot env and could not...

florent.wagener
2018-01-17 21:08
dumb question, is there a default root password to login into sledgehammer ?

shane
2018-01-17 21:08
depends - what type of beer are you talking about ?

shane
2018-01-17 21:08
good beer = password


florent.wagener
2018-01-17 21:09
@shane I only drink belgian beer :slightly_smiling_face:

ctrees
2018-01-17 21:09
I think I got the command line right...

shane
2018-01-17 21:09
ok - that'll worky !

shane
2018-01-17 21:09
sledgehammer credentials: root / rebar1

florent.wagener
2018-01-17 21:09
thanks !

shane
2018-01-17 21:10
but - only at console - unless you've added `access-keys` to inject your own user SSH keys

kamp.scott
2018-01-17 21:10
@shane nevermind seems ok now i copied to drp-provision no dr-provision fixed

florent.wagener
2018-01-17 21:10
@shane of course.

shane
2018-01-17 21:11
:slightly_smiling_face:

shane
2018-01-17 21:11
I still expect beer ... just sayin'

ctrees
2018-01-17 21:15
./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install --force Error: ValidationError: machines/4678729e-5147-43f6-a569-93b7668b8a40: Can not change bootenv while in a stage unless forced. old: sledgehammer new centos-7-install [drpops@drpe drpisolated]$

ctrees
2018-01-17 21:16
same result when: ./drpcli machines bootenv "4678729e-5147-43f6-a569-93b7668b8a40" centos-7-install -f

ctrees
2018-01-17 22:14
just was pinged about this: https://github.com/google/netboot

zehicle
2018-01-17 22:29
we monitor that - if you look at the commit history, it's not actively maintained there. Also, very narrow function.

vlowther
2018-01-17 22:32
@florent.wagener https://github.com/digitalrebar/provision/pull/641 should make your R620 box work in uefi mode. It works for my T320.

ctrees
2018-01-17 22:36
Oh... I've been pushing people to look at drp... and this is just them ping'n back...

kamp.scott
2018-01-17 23:19
so.... when you crash a rebar server say total loss, you literally loose access to all your VMs ? as theres no dhcp running... is this correct ?

kamp.scott
2018-01-17 23:20
scennerio rebar server failed... now i can no longer access he vms that got dhcp freom the rebar box

shane
2018-01-17 23:28
@kamp.scott - nope

shane
2018-01-17 23:30
IF DRP server is completely down - then you can not answer new DHCP queries. Existing DHCP leases remain until the renewal period in the DHCP server expires - then you'll lose IP access. This is DHCP - not Digital Rebar. So - if you are heavily leveraging DRP for DHCP services - then you might be wise to increase the DHCP lease period - so DHCP assignments live a lot longer than you expect any outage to be.

shane
2018-01-17 23:31
Since the DHCP leases are maintained as simple JSON files in the filesystem layer - then it's child's play to keep backups of your DHCP lease assignments - and bring them up in a new DRP instance if for some reason you had a catastrophic failure of your DHCP/DRP based server

shane
2018-01-17 23:32
another solution is to look at a distributed Key/Value store for the backend filesystem layer - currently we support HashiCorp Consul - this is for extreme high availability scenarios - where you have a cluster of Consul servers storing the Key/Value data for the DRP service - including the DHCP leases

kamp.scott
2018-01-17 23:32
right

shane
2018-01-17 23:33
remember - this is basic DHCP stuff - not specific to DRP - however, we do provide a lot of VERY easy mechanisms to manage higher availability than most other DHCP and provisioning servers have - via easy manipulation of the Lease data information, and via the Key/Value based distributed storage mechanism

kamp.scott
2018-01-17 23:51
@shane is there a doc on the consul confi ?

greg
2018-01-17 23:54
@florent.wagener - tip now has better UEFI support. Give it a shot please.

florent.wagener
2018-01-17 23:55
@greg thanks I'll test that tomorrow morning :slightly_smiling_face:

greg
2018-01-18 00:01
awesome

florent.wagener
2018-01-18 00:43
btw, what's the best way to upgrade drp? Right now I'm running a clone from the master branch of the github repo.

shane
2018-01-18 00:54
@kamp.scott the Consul K/V piece isn't a community feature, it's a RackN enterprise support feature

shane
2018-01-18 00:54
@florent.wagener - pretty simple in principle - kill `dr-provision` service, replace the binary with the new one, start `dr-provision`

shane
2018-01-18 00:55
this is basically all the `install.sh` script does in with `--upgrade` flag enabled

shane
2018-01-18 00:55
(`curl -s get.digital.rebar/stable | bash -s -- install --upgrade --force --version=tip --isolated`) (does Isolated install of TIP content)

kamp.scott
2018-01-18 01:04
drpcli -f machines bootenv 9102b704-03ea-40cd-becd-0a65f1d09651 ubuntu-16.04-install Error: ValidationError: machines/9102b704-03ea-40cd-becd-0a65f1d09651: Can not change bootenv while in a stage unless forced. old: sledgehammer new ubuntu-16.04-install

kamp.scott
2018-01-18 01:04
GGGGRRRRRRRRR

shane
2018-01-18 01:20
@ctrees and @kamp.scott - please note that a Stage and a BootEnv are two different things - Even though the names are the same (for convenience - a Stage of "ubuntu-16.04-install" bears the same name as the BootEnv that it implements)

shane
2018-01-18 01:25
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F8UFHRY67/stage_-vs-_bootenv.sh and commented: here's an example of the problem you're running in to

shane
2018-01-18 01:26
note the change from `bootenv` to `stage`

florent.wagener
2018-01-18 01:56
@shane thanks !

shane
2018-01-18 01:57
@florent.wagener - no problem - let me know if you bump in to issues or have questions

florent.wagener
2018-01-18 01:57
Will do. Time to disconnect for me now :slightly_smiling_face: Talk to you tomorrow :slightly_smiling_face:

shane
2018-01-18 01:58
cheers

andreas.holmsten
2018-01-18 09:02
has joined #json

2018-01-18 12:23
ok still not getting these reservations ive added a MAC and ip for the machines but they still dhcp a new ip from the pool and boot into sledgehammer insead of off disk

kamp.scott
2018-01-18 12:48
sorryim here now .... ok still not getting these reservations ive added a MAC and ip for the machines but they still dhcp a new ip from the pool and boot into sledgehammer instead of off disk

wdennis
2018-01-18 13:00
@kamp.scott How important is it to you to preserve the current machine records in DRP?

wdennis
2018-01-18 13:01
And, I'm assuming you are providing DHCP to your provisioning network via DRP?

kamp.scott
2018-01-18 13:01
@wdennis i have 10 machines already prebuilt / booting from hard disk... i dont want drpto do anything to

wdennis
2018-01-18 13:03
It seems that the machine records are tied to the IP address, not the MAC... If the IP changes, DRP would see it as a new machine (my understanding)

wdennis
2018-01-18 13:04
To wit: ``` $ drpcli -E https://192.168.1.148:8092 machines show 174c3987-22a4-43d4-9eb9-0247162e8628 | jq 'del(.Params."gohai-inventory")' { "Address": "192.168.1.102", "Available": true, "BootEnv": "local", "CurrentJob": "9d7d2564-26b4-439e-a049-c5b959b6da32", "CurrentTask": 0, "Description": "Dell PowerEdge R310", "Errors": [], "Meta": { "feature-flags": "change-stage-v2" }, "Name": "k8s-ingress", "OS": "ubuntu-16.04", "Params": { "ipmi/address": "idrac-796MQW1", "ipmi/password": "********", "ipmi/username": "root" }, "Profile": { "Available": false, "Description": "", "Errors": null, "Meta": null, "Name": "", "Params": null, "ReadOnly": false, "Validated": false }, "Profiles": [ "k8s-cluster1" ], "ReadOnly": false, "Runnable": true, "Secret": "*********", "Stage": "complete", "Tasks": [], "Uuid": "174c3987-22a4-43d4-9eb9-0247162e8628", "Validated": true } ```

2018-01-18 13:04
Time to feed the :bear:!

wdennis
2018-01-18 13:04
Notice there is only an "Address" attribute, and no "MAC" attribute

kamp.scott
2018-01-18 13:08
@wdennis there is address / token / strategy which translates to 2xx.3xx.4xx.4xx / ba:25:96:29:71:1f / MAC

kamp.scott
2018-01-18 13:08
at least thats whats in the ui

kamp.scott
2018-01-18 13:09

kamp.scott
2018-01-18 13:10
drpcli reservations create '{ "Addr": "1.1.1.1", "Token": "08:00:27:33:77:de", "Strategy": "MAC" }'

kamp.scott
2018-01-18 13:11
now... seemingly my reservations ip are outside the configured subnet scope

kamp.scott
2018-01-18 13:11
148.251.24.7/27 148.251.24.11 148.251.24.29 6000 72000 is my subnet

kamp.scott
2018-01-18 13:13
my reservation is 148.251.24.4 da:45:cf:f6:6b:11 MAC

kamp.scott
2018-01-18 13:13
be smarter if we could simple "ignore" the MAC - do nothing

wdennis
2018-01-18 13:15
@kamp.scott You can't ignore the MAC - that's what DHCP uses to know what machine gets what IP

wdennis
2018-01-18 13:16
And reservations should be outside of the dynamic address pool

kamp.scott
2018-01-18 13:17
but its ignoring the reservation

wdennis
2018-01-18 13:17
Does it already have a dynamic lease in the "db" for that MAC?

kamp.scott
2018-01-18 13:17
it boots into sledgehammer

wdennis
2018-01-18 13:18
You'd have to remove the dynamic lease record first; the DHCP server would already "know" what IP addr goes with that MAC

wdennis
2018-01-18 13:22
Do this - run this command and post back what it says: `drpcli machines list -E https://<your-drp-ip>:8092 | jq '.[] | .Name + ", " + .Address + ", " + .BootEnv'` For instance, on my DRP system, I get: ``` "testinstall, 192.168.1.125, local" "k8s-ingress, 192.168.1.102, local" "testnode03, 192.168.1.114, local" "testnode04, 192.168.1.132, local" "testnode02, 192.168.1.110, local" "testnode01, 192.168.1.123, local" "is-ef-n1, 192.168.1.112, sledgehammer" ```

wdennis
2018-01-18 13:23
(you don't need the `-E https://<your-drp-ip>:8092` part if you are running `drpcli` off the DRP host itself - that's for a remote `drpcli`)

vlowther
2018-01-18 14:12
I will take a look at that.

wdennis
2018-01-18 14:12
@vlowther at what?

vlowther
2018-01-18 14:12
precreated reservations and machines not behaving the way you expect.

wdennis
2018-01-18 14:13
ah

wdennis
2018-01-18 14:14
If there exists a dyn lease record for a MAC, and then one enters a static lease record for that same MAC, which one "wins"?

vlowther
2018-01-18 14:14
@kamp.scott Is that example machine you poseted above one that you precreated and expect it to boot into Sledgehammer, or something else?

shane
2018-01-18 14:14
@andreas.holmsten welcome

ctrees
2018-01-18 14:16
I'm working through the same sort of thing except with an array of ILO devices... Wondering about 'missing mac'

wdennis
2018-01-18 14:16
Or does (should) the DRP system detect a prior MAC entry, and remove it in favor of the new one? (not sure how this works w/ DRP DHCP server)

ctrees
2018-01-18 14:16

wdennis
2018-01-18 14:17
@ctrees Do you have any statics mapped?

ctrees
2018-01-18 14:18
mapped in drp ? or ?

wdennis
2018-01-18 14:18
yes, in DRP

wdennis
2018-01-18 14:18
Or are you using outboard DHCP?

greg
2018-01-18 14:19
@ctrees - Add `.State` to your jq magic.

ctrees
2018-01-18 14:19
what I did was put statics outside the DHCP

ctrees
2018-01-18 14:20

greg
2018-01-18 14:21
So, `INVALID` means that DRP decided they weren?t safe to use. Either something responded to ping or they were NAKed by the client.

greg
2018-01-18 14:21
`ACK` means what it sounds like

greg
2018-01-18 14:22
`OFFER` means pending.

ctrees
2018-01-18 14:24
hum... so with @vlowther websocket log alert... I could just listen to 'traffic' on a 'dirty toilet' and 'clean-up' a network... hum...

greg
2018-01-18 14:24
umm - not perhaps how I would have put it, but yes.

ctrees
2018-01-18 14:31
how does something make a request without a mac ?

wdennis
2018-01-18 14:31
@ctrees How would that work?

ctrees
2018-01-18 14:32
you mean the toilet thing?

wdennis
2018-01-18 14:32
No, "make a request without a mac"

ctrees
2018-01-18 14:33
don't know... that's what I'm curious about... if it's INVALID... I'd like to figure out where it came from... and would think a mac would be involved

ctrees
2018-01-18 14:34
for drp to even see it... it'd have to be in at least arp ?

wdennis
2018-01-18 14:35
ARP is MAC (layer 2) to IP (layer 3) mapping


ctrees
2018-01-18 14:36
thanks

kamp.scott
2018-01-18 14:37
@vlowther the machines were all created staically addressed before rebar was deployed now with reboo if we reboot one it drops it into sledehammer

kamp.scott
2018-01-18 14:38
so i shut down rebar and rebooted the static machine o its orig state then started rebar again and created a "reservation" however rebar still tries to install it

wdennis
2018-01-18 14:40
@kamp.scott Do you have a DHCP server that is separate from DRP on your deployment network?

kamp.scott
2018-01-18 14:40
no the only thin doing dhcp is rebarthe machines in question have a "static" ip

vlowther
2018-01-18 14:41
ok

wdennis
2018-01-18 14:41
So, they do NOT use DHCP to address themselves?

vlowther
2018-01-18 14:41
Can you give me an example of what JSON you are passing in to create the machine and the reservation?

kamp.scott
2018-01-18 14:42
@vlowther i did it in the ui

vlowther
2018-01-18 14:42
DRP uses ping to see if it can issue a lease, not ARP.

kamp.scott
2018-01-18 14:42
just ip mac address and MAC

wdennis
2018-01-18 14:42
Ah I see - perhaps he's creating machine rec's manually

kamp.scott
2018-01-18 14:42
well they pxe boot

kamp.scott
2018-01-18 14:43
@vlowther want to see the ui ?

vlowther
2018-01-18 14:43
we use ping because arp is limited to the local subnet, and we can handle remote subnets.

vlowther
2018-01-18 14:44
@kamp.scott I am more interested in what the JSON for the machine winds up looking like.

kamp.scott
2018-01-18 14:44
these 10 ips are in the samesubnet which is why i created a reservation for them thinking they would just get the sameip and boo from disk

kamp.scott
2018-01-18 14:45
i guess weneed rebar to think its already done is job for these ips :slightly_smiling_face:

vlowther
2018-01-18 14:45
right, whicn is why I need to see what the machine JSON looks like from the CLI

wdennis
2018-01-18 14:45
Q: does PXE always use DHCP? Or can you somehow assign static to the PXE-boot bios for the NIC?

vlowther
2018-01-18 14:46
PXE is a pseudo-standard layered on top of DHCP

wdennis
2018-01-18 14:46
I've never done PXE without DHCP being involved...

wdennis
2018-01-18 14:46
lol "pseudo-standard"

vlowther
2018-01-18 14:47
If it isn't an RFC, ANSI, ISO, or similar standard, it is a pseudo-standard. :slightly_smiling_face:

wdennis
2018-01-18 14:47
Welcome to your hell :wink:

kamp.scott
2018-01-18 14:47
@vlowther ok how do i get that data for you from the cli?

kamp.scott
2018-01-18 14:48
drpcli leases list [ { "Addr": "148.251.24.5", "Available": true, "Errors": [], "ExpireTime": "2018-01-18T10:01:58.851340004-05:00", "Meta": {}, "ReadOnly": false, "State": "ACK", "Strategy": "MAC", "Token": "ba:25:96:29:71:1f", "Validated": true },

vlowther
2018-01-18 14:49
drpcli machines get Name:<machine name>

kamp.scott
2018-01-18 14:49
drpcli reservations list [ { "Addr": "148.251.24.4", "Available": true, "Errors": [], "Meta": {}, "NextServer": "", "Options": [], "ReadOnly": false, "Strategy": "MAC", "Token": "da:45:cf:f6:6b:11", "Validated": true }, { "Addr": "148.251.24.5", "Available": true, "Errors": [], "Meta": {}, "NextServer": "", "Options": [], "ReadOnly": false, "Strategy": "MAC", "Token": "ba:25:96:29:71:1f", "Validated": true } ]

vlowther
2018-01-18 14:49
will get a single name.

vlowther
2018-01-18 14:50
@kamp.scott PM that to me?

shane
2018-01-18 14:50
(please use "Snippet" -- the plus symbol to left of input box -- in Slack for code like the above)

shane
2018-01-18 14:50
(it helps make long info collapsible and makes the channel more readable)

vlowther
2018-01-18 14:50
yeah, that too. :slightly_smiling_face:

wdennis
2018-01-18 14:51
And remeber to filter the gohai stuff out -- `drpcli machines list | jq '.[] | del(.Params."gohai-inventory")'`

vlowther
2018-01-18 14:51
aw, but I wirked so hard to write that. :slightly_smiling_face:

wdennis
2018-01-18 14:52
Dude - it's awesome, but oh the output lines :joy:

kamp.scott
2018-01-18 14:52
jeeeez howw do you pm someone in slack

shane
2018-01-18 14:53
scroll down on left panel

shane
2018-01-18 14:53
find Direct Messages - click on + to pop up selection panel

shane
2018-01-18 14:53
or - you can use `/msg` command

shane
2018-01-18 14:54
or `/dm` command

kamp.scott
2018-01-18 14:54
yupp got it

kamp.scott
2018-01-18 15:02
so anyway... not that i want to be rebooting our mail server often but it did just happen lucky for us it wasnt totally automated of wed have a fresh install

kamp.scott
2018-01-18 15:02
and no more mailserver

zehicle
2018-01-18 16:34
@kamp.scott I'd suggest not having install-os as a default if you are mixing production servers into your process. Discovery is a safer default since it's not destructive.

kamp.scott
2018-01-18 17:14
welp @vlowther solved my existing systems issue nicely

vlowther
2018-01-18 17:19
tl;dr: to keep DRP from messing with a machine, create a reservation for it and create a machine with the reserved IP, stage none and bootenv local

vlowther
2018-01-18 17:21
That will make sure it gets a consistent IP, that it has no tasks to run, and that the PXE files we write for it will have it boot to the local disk.

kamp.scott
2018-01-18 17:39
ok... coll all machines created.... no onto configuring a workflow....

kamp.scott
2018-01-18 17:39
or should i tackle kribs for my infrastructure first :slightly_smiling_face:

shane
2018-01-18 18:43
KRIBs is just a workflow - nothing more

kamp.scott
2018-01-18 19:16
seems the work flow.....ivegott 3 systes that did the centos-install and rebooed now they jus seemto be sitting here

kamp.scott
2018-01-18 19:18
static 4a4900b9-4116-4057-aee0-84219ad6d12b 148.251.24.13 local local static 7b17aa61-2b44-4d1a-8333-d3ea105bc1d1 148.251.24.14 local local static 8580bec6-c2d2-420e-bc10-42bf87c25c89 148.251.24.15 local local

shane
2018-01-18 19:29
"sitting where"

shane
2018-01-18 19:30
you have those 3 systems to boot to local disk - not to do any provisioning with "local/local"

kamp.scott
2018-01-18 19:31
drpcli profiles show k8s-cluster | jq -r '.Params."krib/cluster-master"' null

kamp.scott
2018-01-18 19:33
@shane i followed the kribs guide

kamp.scott
2018-01-18 19:34
for install-to-local-disk mode: centos-7-install -> runner-service:Success runner-service -> finish-install:Stop finish-install -> docker-install:Success docker-install -> krib-install:Success krib-install-> complete:Success discover->sledgehammer-wait:Success

shane
2018-01-18 19:35
please show me the command you used to display your 3 machines above

kamp.scott
2018-01-18 19:39
@shane that was pasted from the ui under machines

shane
2018-01-18 19:40
right - so "local" stage and "local" bootenv tells Digital Rebar Provision to ignore the Machines and have them just boot to the locally installed operating system

shane
2018-01-18 19:40
you can not perform any further workflow or provisioning

shane
2018-01-18 19:40
you must change the machines Stage to start it on the work flow for KRIB

kamp.scott
2018-01-18 19:40
i did that

kamp.scott
2018-01-18 19:40
i edited each and added the profile

kamp.scott
2018-01-18 19:41
then rebooted per the instructions

kamp.scott
2018-01-18 19:41
they then installed centos-7 and rebooted then nothing more


shane
2018-01-18 19:41
the last few paragraphs of that section

shane
2018-01-18 19:41
you need to change the Stage of the Machines to start them on the KRIB install process

shane
2018-01-18 19:42
starting from: _"Change stage on the Machines to initiate the Workflow transition."_

shane
2018-01-18 19:44

ctrees
2018-01-18 19:45
and remember what @shane told us last night... please note that a Stage and a BootEnv are two different things (see 7:20PM shane)

kamp.scott
2018-01-18 19:49
ok maybe im an idiot but ive done what you both are telling me to do

ctrees
2018-01-18 19:50

ctrees
2018-01-18 19:51
or I'm still confused on when you 'should could' Stage vs BootEnv

shane
2018-01-18 19:52
can you please provide a screenshot of UI Machines panel - or run this `drpcli` command: `drpcli machines list | jq -r '.[] | "\(.Name) : \(.Stage) : \(.BootEnv)"'`

shane
2018-01-18 19:53
@ctrees see the Note - this is not using Workflow/Stages to move a machine through provisioning process

shane
2018-01-18 19:54
using a workflow and stages is different

shane
2018-01-18 19:54
that process highlights manually moving a machine through provisioning steps

ctrees
2018-01-18 19:59
yea... but the instructions on the quickstart did not work 4me (I had to use stage, not bootenv... as the machine was sitting in sledgehammer... I think..) and still wrapping my head around it... I get now that workflow need to plan for expected future events... I was going back over rob and greg video of some of that to see what I was missing...

kamp.scott
2018-01-18 19:59
drpcli machines list | jq -r '.[] | "\(.Name) : \(.Stage) : \(.BootEnv)"' http://static.18.24.251.148.clients.your-server.de : centos-7-install : centos-7-install http://static.16.24.251.148.clients.your-server.de : centos-7-install : centos-7-install http://static.17.24.251.148.clients.your-server.de : centos-7-install : centos-7-install

kamp.scott
2018-01-18 19:59
ok this is 3 new machines

kamp.scott
2018-01-18 20:00
my_machines stage ssh-access + drpcli machines stage 86064587-bd6f-4999-ad0b-772ee5ed12c5 ssh-access Error: ValidationError: machines/86064587-bd6f-4999-ad0b-772ee5ed12c5: Can not change stages with pending tasks unless forced + set +x + drpcli machines stage 1d064d6f-cf06-4a11-a1d6-7b23766ba5a0 ssh-access Error: ValidationError: machines/1d064d6f-cf06-4a11-a1d6-7b23766ba5a0: Can not change stages with pending tasks unless forced + set +x + drpcli machines stage d72597cb-390a-4f96-8cf8-0b72dca2364b ssh-access Error: ValidationError: machines/d72597cb-390a-4f96-8cf8-0b72dca2364b: Can not change stages with pending tasks unless forced + set +x

kamp.scott
2018-01-18 20:01
jeeezzz....

kamp.scott
2018-01-18 20:12
my_machines action powercycle + drpcli machines action 86064587-bd6f-4999-ad0b-772ee5ed12c5 powercycle Error: GET: machines/86064587-bd6f-4999-ad0b-772ee5ed12c5: Action powercycle: Not Found + set +x + drpcli machines action 1d064d6f-cf06-4a11-a1d6-7b23766ba5a0 powercycle Error: GET: machines/1d064d6f-cf06-4a11-a1d6-7b23766ba5a0: Action powercycle: Not Found + set +x + drpcli machines action d72597cb-390a-4f96-8cf8-0b72dca2364b powercycle Error: GET: machines/d72597cb-390a-4f96-8cf8-0b72dca2364b: Action powercycle: Not Found + set +x

kamp.scott
2018-01-18 20:12
doesnt even work

shane
2018-01-18 20:13
did you install an IPMI plugin to support power actions ?

shane
2018-01-18 20:13
if not you can't power cycle

shane
2018-01-18 20:14
I failed to mention that in the Doc - you need a Plugin to implement the IPMI power actions

shane
2018-01-18 20:15
actually - I did mention it - briefly

kamp.scott
2018-01-18 20:15
i rebooted them from console

kamp.scott
2018-01-18 20:15
theirinstalling centos-7 again

kamp.scott
2018-01-18 20:16
then theyll probably jus reboo like before and do nothing

kamp.scott
2018-01-18 20:16
centos-7-install Start runner-service Success (remove step) finish-install Stop (remove step) docker-install Success (remove step) krib-install Success (remove step) complete Success (remove step) discover Start sledgehammer-wait Success (remove step)

kamp.scott
2018-01-18 20:16
thats the work flow for the profile

greg
2018-01-18 20:16
What profile did you create this in?

kamp.scott
2018-01-18 20:16
k8s

kamp.scott
2018-01-18 20:17
i added the profile

greg
2018-01-18 20:17
Is that the only profile on the machines?

kamp.scott
2018-01-18 20:17
well there is default

greg
2018-01-18 20:17
default?

kamp.scott
2018-01-18 20:19

kamp.scott
2018-01-18 20:20
eachmachine has the ks-cluser profile

greg
2018-01-18 20:20
`drpcli profiles show k8s-cluster`

kamp.scott
2018-01-18 20:20
and sure enough they are jus sitting there again

kamp.scott
2018-01-18 20:22

kamp.scott
2018-01-18 20:22
oh wait...is in docker install now

kamp.scott
2018-01-18 20:22
so something changed

greg
2018-01-18 20:23
The profile shows progress being made.

shane
2018-01-18 20:27
I updated the KRIB doc (in `latest`) to highlight the IPMI actions and Plugin Provider status

shane
2018-01-18 20:30
@kamp.scott - I wouldn're recommend leaving those pastebin's up - you have sensitive tokens and cluster admin related information in those

kamp.scott
2018-01-18 20:34
hrmmm seems hung...2 are still on docker-install

kamp.scott
2018-01-18 20:35
ill lesve it be for a while

greg
2018-01-18 20:35
Or you could check the jobs system and see what they say.

greg
2018-01-18 20:35
For example, the jobs for each machine running docker-install. There should be a job with a log for that.

kamp.scott
2018-01-18 22:30
nope seems its hung up on2 nodes inn krib-install

shane
2018-01-18 22:35
check the job log and see why

kamp.scott
2018-01-18 23:13
@shane i would if the ui would load it

kamp.scott
2018-01-18 23:13
anyway to do it from cli ?

kamp.scott
2018-01-18 23:14
2018-01-18T17:44:12 n/a 5ec554d0-0b5d-4bf9-82d9-296ad30af345 e99c4d85-73ca-461e-a599-fc8463a00a7a krib-install krib-install

shane
2018-01-18 23:15
`drpcli jobs list` for all jobs `drpcli jobs list | jq ".[] | select(.Machine==\"$UUID\")"`

shane
2018-01-18 23:15
(where $UUID is a variable holding the UUID of the machine you'd like to inspect jobs for)

kamp.scott
2018-01-19 09:29
ok well seems some cli things did get my KRIBs installed though i cant seem to ssh into it

kamp.scott
2018-01-19 09:31
"ssh-access": { "user1": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHr/fzI3B7dQ6KEXyPVjA0iXiPwyyAFN2/NwTeBySp290kp4wMKMUQo0cZs8hxZRhJv51zIGGcT46CyASOy9R7vHJJwP+RYVA4LuGKhbFvI4nB3BdCF2M+Rbsc+RR7X4NIVdsMIbbCnYKBWrk4cb8NgXLicns/pH5gL1ZFG2Zecu8H0m7JYyuRNixVJRu4Gk5iGZwGfqyL5iOvQhuD5FpCmQXYoU3CGSALFzRh8DfFDA9ZhdjfR2b/x9feeBdTjLB8kEa0YmqBgPwsW1r8GiV0pRvW8ROEx6RJCRhaUGcg2aE+Re6s+h6IiHPv59TzQjwWNoxDKhSj+WjPg3Jhh+PZ dingo@new-host-2" }

kamp.scott
2018-01-19 09:31
is they key config from the yaml

kamp.scott
2018-01-19 09:32
i tried both as root and as user1

kamp.scott
2018-01-19 09:32
keeps asking me for apassword

ctrees
2018-01-19 13:19
I was just messing with this... and found this helpful:


ctrees
2018-01-19 13:22
Have not done the krib install... but was about to ask about this...

ctrees
2018-01-19 13:23
RackN-Portal -> Profiles -> root-access-example

ctrees
2018-01-19 13:24
access-keys: { "greg": "ssh-rsa blahblah... galthaus@Gregs-MacBook-Pro.local" }

ctrees
2018-01-19 13:25
access-ssh-root-mode: "without-password"

ctrees
2018-01-19 13:30
I was looking at how the "user": "ssh-rsa ... user@machine" association is done... but in your case I 'THINK' you just need the access-ssh-root-mode: "without-password" ? part ?


ctrees
2018-01-19 13:34
access_keys Map of strings The key is the name of the public key. The value is the public key. All keys are placed in the .authorized_keys file of root.

kamp.scott
2018-01-19 14:32
ssh-access: { "user1": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDHr/fzI3B7dQ6KEXyPVjA0iXiPwyyAFN2/NwTeBySp290kp4wMKMUQo0cZs8hxZRhJv51zIGGcT46CyASOy9R7vHJJwP+RYVA4LuGKhbFvI4nB3BdCF2M+Rbsc+RR7X4NIVdsMIbbCnYKBWrk4cb8NgXLicns/pH5gL1ZFG2Zecu8H0m7JYyuRNixVJRu4Gk5iGZwGfqyL5iOvQhuD5FpCmQXYoU3CGSALFzRh8DfFDA9ZhdjfR2b/x9feeBdTjLB8kEa0YmqBgPwsW1r8GiV0pRvW8ROEx6RJCRhaUGcg2aE+Re6s+h6IiHPv59TzQjwWNoxDKhSj+WjPg3Jhh+PZ dingo@new-host-2" }

kamp.scott
2018-01-19 14:33
is whats in my krib profile

kamp.scott
2018-01-19 14:33
and doesnt seem to work

kamp.scott
2018-01-19 14:35

ctrees
2018-01-19 14:35
Well.. the krib file may assume that the global profile has the access-ssh-root-mode: "without-password"

ctrees
2018-01-19 14:42
the 'keep asking for pass' seems like -> access-ssh-root-mode: "without-password" is missing...

kamp.scott
2018-01-19 14:52
@ctrees i get it but i think ifthat was the case it would be in the K8s profile... right?

ctrees
2018-01-19 14:56
Well.. when they did the demo, they were using a packet profile (I THINK)... and using packet the ssh key setup is done in that profile...

ctrees
2018-01-19 14:57
aka... it's a boot-strap-env thing ?

ctrees
2018-01-19 14:59
BTW... I'm sure shane can get you some packet 'free time' setup a packet endpoint and I bet krib works 'out-of-box' and he has a terraform example for that in the repo...

kamp.scott
2018-01-19 14:59
ihave my own servers

greg
2018-01-19 15:25
@kamp.scott - is it NOT `ssh-access`. It is `access-keys`. Check the params in the UX. You will see the keys to set.

greg
2018-01-19 15:26
Updating docs to change it from `access_keys` to `access-keys`

kamp.scott
2018-01-19 15:26
@greg then the profile is wrong in the docks on KRIB ?

greg
2018-01-19 15:27
Yes - krib is wrong and needs to be updated.

kamp.scott
2018-01-19 15:29
@greg so i need to reinstall the whole cluster ?

greg
2018-01-19 15:31
probably safest.

kamp.scott
2018-01-19 15:32
ughhh

kamp.scott
2018-01-19 15:32
:

greg
2018-01-19 15:33
Basically, I?m not sure I can walk you through the running of a single stage to set the keys.

greg
2018-01-19 15:34
For example, I think you could set the machines to the ssh-access stage.

greg
2018-01-19 15:34
It will rerun the ssh-access task.

greg
2018-01-19 15:34
Assuming you?ve edited the profile to have the correct key.

greg
2018-01-19 15:34
then you can set the stage back to complete.

greg
2018-01-19 15:36
That should work, but I don?t know if you have machines in the correct state, if you have the job knowledge to verify that the ssh-access task ran, and when to deicde it was successful.

greg
2018-01-19 15:37
Okay - here is the steps to do (thinking about them):

greg
2018-01-19 15:41
1. Run this: `drpcli machines list | jq -r '.[] | "\(.Name), \(.Runnable), \(.Stage), \(.CurrentTask)"'`

greg
2018-01-19 15:41
The machines you want to manipulate should have ?<name>, true, complete, 0?

greg
2018-01-19 15:42
2. Edit the profile to fix the `ssh-access` to `access-keys`

greg
2018-01-19 15:42
3. For each machine, set the stage to `ssh-access`

greg
2018-01-19 15:44
4. Run the command from #1 until the stage show `ssh-access` and currentTask shows `1`

greg
2018-01-19 15:44
5. Once that is done, you should be able to ssh into the boxes.

greg
2018-01-19 15:45
6. For each machine, set the stage to `complete`

greg
2018-01-19 15:45
@kamp.scott that may get you back to ssh-access.

kamp.scott
2018-01-19 16:09
ok i reinsalled after changing the ssh-access to access-keys

kamp.scott
2018-01-19 16:09
im in now

kamp.scott
2018-01-19 16:14
@greg is there no kubernees ui exposed on the maser public ip ?

shane
2018-01-19 16:16
please see the Video in the KRIB documentation link

shane
2018-01-19 16:16
it's discussed there

kamp.scott
2018-01-19 16:37
no not reallyi dont want to use a local proxy i want to expose the ui on the public side

ctrees
2018-01-19 16:38
oh... sorry..

kamp.scott
2018-01-19 16:39
no idea why they would do it this way....whats the logic

kamp.scott
2018-01-19 16:40
aside from hat i dont have kubectl installed on my local lapop

kamp.scott
2018-01-19 16:43
@shane so no joy here i guess

kamp.scott
2018-01-19 17:54
$KRIB


kamp.scott
2018-01-19 18:00
127.0.0.1 sent an invalid response. ERR_SSL_PROTOCOL_ERROR

kamp.scott
2018-01-19 18:00
jesus

greg
2018-01-19 18:02
What command did you run?


kamp.scott
2018-01-19 18:15
okfrom my laptop

kamp.scott
2018-01-19 18:16
export KUBECONFIG=`pwd`/admin.conf ? ? 13:15 ? 19.01.18 ? 50.64G RAM kubectl get nodes 2018-01-19 13:15:33.601486 I | proto: duplicate proto type registered: google.protobuf.Any 2018-01-19 13:15:33.601637 I | proto: duplicate proto type registered: google.protobuf.Duration 2018-01-19 13:15:33.601688 I | proto: duplicate proto type registered: google.protobuf.Timestamp NAME STATUS AGE VERSION http://static.27.24.251.148.clients.your-server.de Ready 2h v1.9.2 http://static.28.24.251.148.clients.your-server.de Ready 2h v1.9.2 http://static.8.24.251.148.clients.your-server.de Ready 2h v1.9.2

kamp.scott
2018-01-19 18:16
then

kamp.scott
2018-01-19 18:16
kubectl proxy ? ? 13:15 ? 19.01.18 ? 50.64G RAM 2018-01-19 13:16:29.223153 I | proto: duplicate proto type registered: google.protobuf.Any 2018-01-19 13:16:29.223292 I | proto: duplicate proto type registered: google.protobuf.Duration 2018-01-19 13:16:29.223334 I | proto: duplicate proto type registered: google.protobuf.Timestamp Starting to serve on 127.0.0.1:8001

kamp.scott
2018-01-19 18:17
then in a browser

kamp.scott
2018-01-19 18:17

kamp.scott
2018-01-19 18:17
This site can?t provide a secure connection 127.0.0.1 sent an invalid response. ERR_SSL_PROTOCOL_ERROR

greg
2018-01-19 18:17
Sorry - k8s api doesn?t want ssl, I think.

kamp.scott
2018-01-19 18:17
fix the docs :slightly_smiling_face:

shane
2018-01-19 18:17
It's not a Digital Rebar problem

shane
2018-01-19 18:18
it's a Kubernetes thing - please go read the Kubernetes documentation to understand how to use it

kamp.scott
2018-01-19 18:18
24.3.8.4. Use Kubernetes Dashboard via Proxy

kamp.scott
2018-01-19 18:18
@shane i folllowed the doc

kamp.scott
2018-01-19 18:18
can i jus expose the web ui on the public interface ?

greg
2018-01-19 18:19
Did changing https to http address your problem, @kamp.scott

greg
2018-01-19 18:19
I doubt it. That is a k8s issue. Learn about k8s.

shane
2018-01-19 18:21
excellent Kubernetes doc on the UI and how to interact with it: https://kubernetes.io/docs/tasks/access-application-cluster/web-ui-dashboard/

greg
2018-01-19 18:21
Actually, it should be https.

greg
2018-01-19 18:21
according to that doc. So, I don?t know.

shane
2018-01-19 18:22
yes - Kubernetes use Certs by default

shane
2018-01-19 18:22
@kamp.scott - KRIB is a demonstration pattern of how to create Immutable or Installed Kubernetes clusters using Digital Rebar content

shane
2018-01-19 18:23
we are not a Kubernetes shop - and we don't have expertise in it - we just hack around enough to get it up and running

kamp.scott
2018-01-19 18:23
@shane well thats nice and the work is usefulhowever to be properly used the ui should be exposed on the master ip :slightly_smiling_face:

kamp.scott
2018-01-19 18:23
illsee if i can work to expose i myself

shane
2018-01-19 18:24
we disagree fundamentally - we prefer using a CLI here ... :slightly_smiling_face:

kamp.scott
2018-01-19 18:24
@shane we arent serving just "us" i have clients

kamp.scott
2018-01-19 18:24
i was seeing if in fact it was viabke for a clien to spin up a cluser

greg
2018-01-19 18:25
So, @kamp.scott - how does k8s handle multitenancy?

wdennis
2018-01-19 18:25
@Dingo Actually, in Kubernetes, everything runs on internal private IP space... Nothing gets exposed ?externally? unless an admin exposes it...

kamp.scott
2018-01-19 18:26
@greg via calico or hypernetes

kamp.scott
2018-01-19 18:27
same with DC/OS

kamp.scott
2018-01-19 18:27
andmesosphere

greg
2018-01-19 18:27
hmm - then you may want to start working on a deployment workload for hypernetes. Looks interesting.

whitlow.john
2018-01-19 18:27
has joined #json

kamp.scott
2018-01-19 18:28
@greg yeah asi learn more i can contribute more

shane
2018-01-19 18:28
@whitlow.john - welcome

kamp.scott
2018-01-19 18:28
both in knowledeand profiles

greg
2018-01-19 18:28
The default k8s doesn?t. It doesn?t understand multitenancy. It expects external RBAC.

wdennis
2018-01-19 18:29
@Dingo DRP ?KRIB? is just a ?means to an end? to get nodes to 1) get a requisite OS installed; 2) install k8s pre-reqs (I.e. relevant container runtime system, such as Docker); and 3) run the ?kubeadm? k8s deployment tool and pass in relevant variables from the DRP profile

kamp.scott
2018-01-19 18:30
@greg yeah i get that i however was planning tomigrate services to it that require external exposure

kamp.scott
2018-01-19 18:30
@wdennis dont worry imnotknocking it i was just surprised it was exposed only internally

kamp.scott
2018-01-19 18:30
we could mention tha in the docs also

greg
2018-01-19 18:30
Much like @wdennis was pointing out. Services are constructed internally. k8s in AWS is nice because it automatically integrates with ELBs so that ports can be exposed and routed through the proxies. In physical envs, this is a snowflake feature. You could configure node ports (Did you change that on your ui yaml definition file?) or add lb services that hook into a provider mechanism.

wdennis
2018-01-19 18:30
It?s strictly a k8s thing

kamp.scott
2018-01-19 18:31
right like traeffic

wdennis
2018-01-19 18:31
No failing of DRP/KRIB

kamp.scott
2018-01-19 18:31
auto web proxying for a cluster

kamp.scott
2018-01-19 18:31
@wdennis nope just different then im used to is all

greg
2018-01-19 18:32
we choose the result of running kubeadm with defaults. Like the k8s community starts with. We didn?t want to add FUD. Just simple straight usage. You are a truly advanced user. You are beyond KRIB.

wdennis
2018-01-19 18:33
I myself am working on a Træfik proxy for my cluster; the problem is I want to do a two-interface one, which I am not sure is supported...

kamp.scott
2018-01-19 18:34
as a side note is there a coreos drpcli bootenvs uploadiso for it ?

kamp.scott
2018-01-19 18:36
@wdennis should be feasible

wdennis
2018-01-19 18:37
@kamp.scott if you figure it out, tell me how... the Træfik guys themselves don?t know if it?s doable... (join Træfik Slack, read #kubernetes channel)

wdennis
2018-01-19 18:38
But anyways, that?s not DRP-related, so let?s not discuss that here...

zehicle
2018-01-19 19:10
IMHO, KRIB is showing how you can join nodes to a cluster immutably. That's what we hear people asking for - reboot, run in memory, join cluster. The fact that it builds the API server is a necessary bonus. Ideally, you could use Kubespray to build the cluster since it has some real operations choices.

ctrees
2018-01-19 21:12
So I was attempting the centos-7-install and it seemed to get to Stage: local BootEnv: local but does not come up... I 'suspect' the actual HD... I looked into the Job logs but did not see anything...

ctrees
2018-01-19 21:13
again... I suspect maybe a bad local disk drive

greg
2018-01-19 21:18
getting to local usually means that the kickstart post-install finished. Last step is to set the local bootenv/stage. From there, os install should finish and reboot. May have to check the console to see what is happening.

ctrees
2018-01-19 21:20
yea it went to boot local and is just sitting there...

ctrees
2018-01-19 21:20
like it doesnt have a boot sector...

ctrees
2018-01-19 21:23
and yup... I think drp did everything... I think it's something on the drive as these HP SAS I am using I'm not sure of... but was looking for some discover feed back about the state of the drive... or if that's possible... otherwise I better go validate the drives are bootable

shane
2018-01-19 21:23
did you maybe incidentally apply a `kernel-console` change (eg to `TTYS1` or similar) to global profile (or applied profile to machine) ??

greg
2018-01-19 21:23
okay - so - that leads me to believe that we installed to sda (but that may not be in the boot list).

ctrees
2018-01-19 21:25
The only global profile with an ssh key

shane
2018-01-19 21:25
cool - greg is probably on right track then

ctrees
2018-01-19 21:25
yea I think so... smells like the boot list...

shane
2018-01-19 21:25
if the console is switched to an invalid one, as soon as it hits initrd - the console "goes blank", and then if there's a boot problem - you'll never see it

shane
2018-01-19 21:26
but once the OS comes up successfully on boot medium - you should get a console back as soon as getty (or similar) spawns them

ctrees
2018-01-19 21:26
ok... that's good to know...

shane
2018-01-19 21:26
*if* it doesn't come up successfully - you'll never know which/what/why ... :disappointed:

ctrees
2018-01-19 21:27
I'll go dig in the bios abit...

kamp.scott
2018-01-21 00:04
Kubeconfig Please select the kubeconfig file that you have created to configure access to the cluster. To find out more about how to configure and use kubeconfig file, please refer to the Configure Access to Multiple Clusters section. Token Every Service Account has a Secret with valid Bearer Token that can be used to log in to Dashboard. To find out more about how to configure and use Bearer Tokens, please refer to the Authentication section.

kamp.scott
2018-01-21 00:04
ok going deeper... seems kubeconfig want the config file im using however itt says is not valid ?

kamp.scott
2018-01-21 00:04
i know.. dont tell me... its a kubernetes issue right...

zehicle
2018-01-21 00:11
if the system generated a kubeconfig file then it should be working. check out that file

kamp.scott
2018-01-21 00:20
i have the admin.conf as specified in the guide. i can run kubectl commands.... i just cant seem to login to the gui

kamp.scott
2018-01-21 00:22
basically because we really have no idea how tthe deployed cluster is configured i tried to deploy an app and am getting this

kamp.scott
2018-01-21 00:22
Unable to mount volumes for pod "mailserver-6d874d4c67-nm94x_default(790fcb47-fe3b-11e7-8bb5-0a90a721fa77)": timeout expired waiting for volumes to attach/mount for pod "default"/"mailserver-6d874d4c67-nm94x". list of unattached/unmounted volumes=[mailserver-claim0]

kamp.scott
2018-01-21 00:23
i know ... its a kubernetes thing but we dont have a cluehow its configured or deployed or to login

kamp.scott
2018-01-21 00:24
and yes i am a bit new to kubernetes mesosphere Triton and DC/OS... i am familiar with

kamp.scott
2018-01-21 00:26
starts sounding like a whiney #^%& b1tch

zehicle
2018-01-21 00:39
to login to the Kube UI, use kubectl [configfile parm] proxy

zehicle
2018-01-21 00:40
that will give you a local proxy

zehicle
2018-01-21 00:40

zehicle
2018-01-21 00:40
assuming that's your proxy url

zehicle
2018-01-21 00:41
the configuration is default Kubeadm. you can refer to those docs for kube workflow

kamp.scott
2018-01-21 00:53
lastly.. what was the drpcli command to checnge the root access-key

kamp.scott
2018-01-21 00:53
@zehicle ^^

kamp.scott
2018-01-21 00:53
and thanks


zehicle
2018-01-21 18:38

ctrees
2018-01-22 14:47
template expansions follow go templates ? correct ? https://golang.org/pkg/text/template/

shane
2018-01-22 14:47
@ctrees yes

shane
2018-01-22 14:48
our Data Arch docs has a bit of info on the various golang template pieces


ctrees
2018-01-22 14:48

shane
2018-01-22 14:51
presumably, yes - but I'm not certain if our YAML handling follows that exact spec - have to rely on @greg or @vlowther to answer that

ctrees
2018-01-22 14:52
yea I was reading that and saw reference to the go docs in http://provision.readthedocs.io/en/latest/doc/arch.html

greg
2018-01-22 14:52
Close enough.

greg
2018-01-22 14:52
We use this go package: http://github.com/ghodss/yaml

greg
2018-01-22 14:53
it seems to keep pretty close to the spec. I small nit is that if you want yaml pretty printed cleanly. YOu need to make sure that you don?t have white space at the end of lines.

ctrees
2018-01-22 14:53
Oh thanks @greg my fri foo-bar was the HP's have that built-in raid h/w

vlowther
2018-01-22 14:54
@ctrees: We only use the bits of YAML that can be represented in JSON.

vlowther
2018-01-22 14:54
So no integer map keys and the like.

vlowther
2018-01-22 14:54
Our YAML support is basically there to allow a more human-firendly alternative to JSON.

greg
2018-01-22 14:54
oh - yeah that- I just assume it. :slightly_smiling_face:

ctrees
2018-01-22 14:55
... yea I'm sort of sad that yaml seems to have gotten 'accepted' as the default for SDN too... I rather look at JSON bobs and not count white spaces

vlowther
2018-01-22 14:55
since YAML has actual support for inlining large chunks of text. which is a thing we use rather alot. :slightly_smiling_face:

vlowther
2018-01-22 14:56
There is that, but JSON specifically sucks when you are inlining shell scripts

ctrees
2018-01-22 14:57
the "Contents: |+" .... was what started my ... ok, what parse stuff quest..

shane
2018-01-22 14:58
I haven't worked with it much - but what I've seen of TOML - I like it - it's a bit like YAML - but doesn't care about spaces/tabs as much

vlowther
2018-01-22 14:59
The |+ thing basically means "translate all intervening whitespace into a single space"

ctrees
2018-01-22 15:01
yea... but I didn't know if there is a 'stop' to that after... I assume it's until the next same yaml tree level (aka \n_ws_ws_ws match)

ctrees
2018-01-22 15:01
aka yaml rule

vlowther
2018-01-22 15:10
Yep.

ctrees
2018-01-22 15:10
@vlowther is right... that yaml inlining does make it easier to read

shane
2018-01-22 15:11
when I'm looking at templates with inline scripts - I always use the YAML output formatter (eg `drpcli ... <something> ... --format=yaml`)

greg
2018-01-22 21:11
Hi All. v3.6.0 is building and should be out shortly. Stable has been moved. Content updates as well.

greg
2018-01-22 21:13
- thanks!

zehicle
2018-01-22 21:13
well done! :cake:

ctrees
2018-01-22 21:15
I'll pull and run right after my current round of testing reboots...

greg
2018-01-22 21:15
Okay - wait for my all clear. The trees are update, but the builds aren?t quite done yet.

greg
2018-01-22 21:16
I just didn?t want people to be caught off guard.

ctrees
2018-01-22 21:17
well.. my cycle tends to take over a day still... I'm not at that '5-min' shake-n bake Shane can do :wink:

shane
2018-01-22 21:20
@wdennis the new content update includes the "kickseed" capabilities as well

wdennis
2018-01-22 21:22
:tada:woo-hoo!:confetti_ball:

kamp.scott
2018-01-22 21:23
ok jus noticed another "anomoly" where when installing centos VM under XenServer with 120Gb disk rebar only installs a 10GB disk partition

kamp.scott
2018-01-22 21:23
and yes is also seen as xvda

greg
2018-01-22 21:25
We do this:

greg
2018-01-22 21:25
```logvol / --fstype ext4 --name=lv_root --vgname={{.Machine.ShortName}} --size=1 --grow --maxsize=10240 ```

greg
2018-01-22 21:26
Which may be cpaped at 10GB. ? I think grow gorws it. Maybe not.

greg
2018-01-22 21:52
- builds are done. 3.6.0 is out

ctrees
2018-01-22 23:23
I'm having an ssh key issue... I set a the access-ssh in global: access-keys: { "drpops": "ssh-rsa AAA...blahblah...U9n31 drpops@drpe.drpfeature.test" } access-ssh-root-mode: "without-password" Checked jobs on the machine... it looks like it ran the template... but it keeps asking for password...

ctrees
2018-01-22 23:24
is there a default root login on head so I can go figure out what I did ?

greg
2018-01-22 23:24
are logging in as root or drpops?

greg
2018-01-22 23:24
It sets the root keys.

ctrees
2018-01-22 23:24
drpops

ctrees
2018-01-22 23:25
which brings up another question... in robs demo the key is "user" I was debating if I should have that or ?? wasnt sure if the keyname matters ?

ctrees
2018-01-22 23:26
the key of the ssh-rsa string...

greg
2018-01-22 23:26
The keyname is just an id for you to recognize the key.

ctrees
2018-01-22 23:27
[drpops@drpe drpisolated]$ ssh root@192.168.88.102 [root@de4-11-5b-d0-83-78 ~]#

ctrees
2018-01-22 23:27
that worked... so what triggers 'root' vs 'user'

greg
2018-01-22 23:27
not sure. in ubuntu it is rebar. I think.

greg
2018-01-22 23:28
There is a parameter that changes it for ubuntu/debian.

greg
2018-01-22 23:28
centos only sets rootpw.

greg
2018-01-22 23:28
wait aminute I?m wrong.

greg
2018-01-22 23:28
It is always root ssh keys.

shane
2018-01-22 23:28
the Ubuntu seed sets a "Default User" account - while CentOS only twiddles a "root" account

ctrees
2018-01-22 23:28
... yea I got you thining too many things..

shane
2018-01-22 23:28
as part of standard install process

ctrees
2018-01-22 23:29
I got the 'root' part... thinking now of adding addtional 'users'.... I'll go read docs more...

shane
2018-01-22 23:29
in the ubuntu seed - you'll see: ```# Default User Setup d-i passwd/make-user boolean true d-i passwd/user-uid string {{if .ParamExists "provisioner-default-uid"}}{{.Param "provisioner-default-uid"}}{{else}}1000{{end}} d-i passwd/user-fullname string {{if .ParamExists "provisioner-default-fullname"}}{{.Param "provisioner-default-fullname"}}{{else if .ParamExists "provisioner-default-user"}}{{.Param "provisioner-default-user"}}{{else}}Rocket Skates{{end}} d-i passwd/username string {{if .ParamExists "provisioner-default-user"}}{{.Param "provisioner-default-user"}}{{else}}rocketskates{{end}} d-i passwd/user-password-crypted password {{if .ParamExists "provisioner-default-password-hash"}}{{.Param "provisioner-default-password-hash"}}{{else}}$6$drprocksdrprocks$upAIK9ynEEdFmaxJ5j0QRvwmIu2ruJa1A1XB7GZjrnYYXXyNr4qF9FttxMda2j.cmh.TSiLgn4B/7z0iSHkDC1{{end}} d-i user-setup/allow-password-weak boolean true d-i user-setup/encrypt-home boolean false```

shane
2018-01-22 23:30
this drops in a "Default User" with name `rocketskates` - overriding the built-in default of `ubuntu`

shane
2018-01-22 23:30
you can change it by setting the Param `provisioner-default-user`

shane
2018-01-22 23:31
post-provisioning to add more users should be done as a Unique stage to your environment

ctrees
2018-01-22 23:31
so access-keys is JUST for root ?

shane
2018-01-22 23:31
or - we'd suggest other Configuration Management tools as a better approach

shane
2018-01-22 23:32
you shouldn't bake user logic in to your Kickseeds (kickstarts/preseeds)

ctrees
2018-01-22 23:32
ok... I get that...

ctrees
2018-01-22 23:34
I just went through both krib and kubespray and sort of wondering ansible passoff OR ... I take it rob is thinking 'kubectl' (no ssh) but... blah blah... slowly getting head around so much flexiblity...

shane
2018-01-22 23:35
if you look at `drpcli templates show access-keys.sh.tmpl --format=yaml` you'll see the `access-ssh-root-mode` does indeed relate to "root" user only policy and the interesting bit: ``` {{if .ParamExists "access-keys"}} echo "Putting ssh access keys for root in place" mkdir -p /root/.ssh cat >>/root/.ssh/authorized_keys <<EOFSSHACCESS ### BEGIN Access Keys GENERATED CONTENT {{range $key := .Param "access-keys"}} {{$key}} {{end}} ### END Access Keys GENERATED CONTENT EOFSSHACCESS chmod 600 /root/.ssh/authorized_keys {{end}}```

shane
2018-01-22 23:35
which you see is only hacking the `/root/.ssh/authorized_keys` file

shane
2018-01-22 23:36
it'd be pretty trivial to copy-cat this and apply to the default user specified via the `provisioner-default-user` (or fallback to `rocketskates` username if not defined) to set this user SSH keys

shane
2018-01-22 23:36
that would be the preferred model on the Ubuntu side since that's their "security" model ... never mind that the user has full `sudo` access

shane
2018-01-22 23:37
but in theory that's a bit more protected since the default `sudo` access does require user password to authenticate `sudo` usage

ctrees
2018-01-22 23:40
yea and I notice the use of yaml to create yaml to injest yaml elsewhere... I'm sure you guys all have 'spinning tops' (cheap inception joke)

ctrees
2018-01-22 23:41
that threw me till @vlowther explained | vs |+ (I think.... still just theory in my head)

kamp.scott
2018-01-23 01:19
can we deploy rancheros with rebar ?

shane
2018-01-23 01:23
you surely can! You'll need to do some work to create a BootEnv to do this - we don't have a stock one

shane
2018-01-23 01:23
details on how to make PXE boot w/ RancherOS is available at: http://rancher.com/docs/os/v1.1/en/running-rancheros/server/pxe/

shane
2018-01-23 03:15
ok @kamp.scott - if you are willing to experiment and hack around a bit - here is a basic RancherOS set of bootenv/stages that works. WARNING WARNING: THIS WILL NOT WORK ON YOUR METAL this was tested by "borrowing" the http://Packet.net provisioning script - to provision this (successfully) in http://Packet.net environment you WILL HAVE TO MODIFY the `bootenvs/rancheros.yml` kernel options to get a different config file you define you WILL HAVE TO MODIFY the `rancher-packet-provisioning-script.sh` to work for your metal and environment WARNING: I've never used Rancher before - but I was able to boot this against DRP endpoint in http://packet.net without any problems


shane
2018-01-23 03:17
_IF_ you had a http://packet.net account you could test this and it'd work with the following commands/notes: * for http://packet.net need `console=ttyS1,115200n8` parameter applied to machine * untar the above TGZ * create bootenv: `drpcli bootenvs create -< bootenvs/rancheros.yml` * create stage: `drpcli stages crate -< stages/rancheros.yml` * add ISO image: `drpcli bootenvs uploadiso rancheros-latest-install`

shane
2018-01-23 03:22
sort out the customizations in the provisioning steps you need for your bare metal then set a machine to the `rancheros-latest-install` stage and off you go

ctrees
2018-01-23 04:02
[drpops@drpe testansible]$ RS_PROFILE=mycluster ./inventory.py | jq Traceback (most recent call last): File "./inventory.py", line 17, in <module> import requests, argparse, json, urllib3, os ImportError: No module named requests [drpops@drpe testansible]$

ctrees
2018-01-23 04:03
somehow I'm messing up the dynamic inventory... I noticed rob had a ln to the inventory... is that needed (aka how stand-alone is inventory.py)

ctrees
2018-01-23 04:05
[drpops@drpe testansible]$ pip --version pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7)

greg
2018-01-23 04:06
I think you need to do `pip install requests`

ctrees
2018-01-23 04:09
ok thanks..

ctrees
2018-01-23 04:12
that did it.. and a pip upgrade to 9... thanks again

kamp.scott
2018-01-23 07:39
@shane cool thanks ill give it a shot on XenServer

zehicle
2018-01-23 14:18
The ln let me run it from kubespray without a long path

greg
2018-01-23 17:50
- hi all - Some issues were found with gohai on UEFI enabled systems. We?ve updated gohai, sledegehammer, and community content in tip. Additionally, we?ve added the gohai function into drpcli. So, you can now do `drpcli gohai` and it will attempt to inventory the system. This really on works on linux. The community content has also been updated to use drpcli gohai if available over gohai. This means that for gohai updates in the future we will not need to rev sledgehammer, but instead rev drpcli. what this means to you! If you update to tip community content, you will need to run: `drpcli bootenvs uploadiso sledgehammer` before booting new machines.

greg
2018-01-23 17:51
Oh - for reference, the content has been updated to allow for not updating DRP to tip while still using the tip content. If you choose to do that.

ctrees
2018-01-23 18:25
So... I took a shot and attempted to create a clone of the kubespray yaml to create another type of feed for ansible ( calling it blender2cld ) this is in prep for my visit to the animation studio

ctrees
2018-01-23 18:26
is there a 'trick' to get dr-provision to use ? (I basically just stuffed the updated yaml into saas-content )

ctrees
2018-01-23 18:28
basically the studio has an old grid of mine that they fire up when they need animation rendering so I was going to test it out there while I'm down doing other maintainance

ctrees
2018-01-23 21:04
... got it... important to maintain unique names... even read it in the arch docs...

daniel.bernier
2018-01-24 20:42
has anyone used DRP for ONIE based installs ?

greg
2018-01-24 20:52
@daniel.bernier - umm - hmm - kinda. We?ve talked about it in the past with some switch vendors. Started to show a path on with DRv2. DRP should be similar to setup for it. Just haven?t tried. You interested? Anything in particular you trying to boot/install with ONIE?

daniel.bernier
2018-01-24 20:55
ONL

shane
2018-01-24 20:58

greg
2018-01-24 20:59
In general, it has never been much different than sledgehammer.

greg
2018-01-24 21:09
@daniel.bernier - okay - so - in first quick glance, you could have the ONIE device DHCP. Drp would provide an address and options.

daniel.bernier
2018-01-24 21:12
Yup already getting ips from DRP

greg
2018-01-24 21:12
The DRP config need to set option 114 (default-url) to ?http://DRP_IP:8091/files/NOS_image?

greg
2018-01-24 21:13
then put `NOS_image` in the files directory of DRP.

greg
2018-01-24 21:13
```drpcli files upload NOS_image as NOS_image```

greg
2018-01-24 21:14
if you already have a webserver , you can point it at that instead.


greg
2018-01-24 21:16
You should be able to check leases to find your switch and ssh into.

greg
2018-01-24 21:17
There is a lot of advanced config stuff that our DHCP server can do (kinda like ISC?s) to push it.

greg
2018-01-24 21:17
THe option could be set on the subnet and would like be ignored by all things except switches.

greg
2018-01-24 21:18
Or create a reservation for that specific switch with option 114 for its need.

greg
2018-01-24 21:18
An advanced config option would be to put vendor string decomposition into the option 114 and select the right file for the right vendor type, but that is pretty hardcore.

greg
2018-01-24 21:19
@daniel.bernier - make sense?

daniel.bernier
2018-01-24 21:21
@greg totally

daniel.bernier
2018-01-24 21:21
Will try it in a bit

daniel.bernier
2018-01-24 21:21
Already statically reserves but prefer the whole subnet approach

daniel.bernier
2018-01-24 21:22
Next question will be around gohai inventory :-) but that will wait for tomorrow

greg
2018-01-24 21:23
on the switch?

greg
2018-01-24 21:25
Though, it might work for just doing TFTP/http waterfall process as well.

greg
2018-01-24 21:28

daniel.bernier
2018-01-24 23:00
No gohai for existing servers

zehicle
2018-01-24 23:49
w/ the new v1.6 DRPCLI gohai command - you may be able to run it that way. would that work?

2018-01-25 14:40
Hey guys, I'm starting to test digitalrebar/provision and seems that the docs are out of sync (Compared to doc folder in GitHub repo).

shane
2018-01-25 14:51
@amontalban - please use the `latest` version - it is the most up to date in relation to the current version (v3.6.0)


2018-01-25 14:53
Great thanks :+1:

shane
2018-01-25 14:56
no problem - if you bump in to any issues or obvious errors in doc - please let me know .... we've been working on cleaning them up and enhancing them

2018-01-25 14:57
Awesome, will do

shane
2018-01-25 14:57
we'd also be happy to send you a Slack invite so you can use the native Slack app to communicate with us

2018-01-25 14:58
Sure, do yo need my email or something?

shane
2018-01-25 14:59
yes - feel free to email me directly, if you'd rather not post it here ()

2018-01-25 14:59
:+1:

shane
2018-01-25 14:59
not sure if you can direct message me via the sameroom integration ... ?

2018-01-25 15:00
Seems so

shane
2018-01-25 15:01
(I didn't receive a D.M. )

2018-01-25 15:02
Alright, it's amontalban AT perceptyx DOT com

shane
2018-01-25 15:03
sent

amontalban
2018-01-25 15:04
has joined #json

shane
2018-01-25 15:04
welcome (officially...!) @amontalban

amontalban
2018-01-25 15:05
Thanks :slightly_smiling_face:

zehicle
2018-01-25 16:07
$welcome

amontalban
2018-01-25 17:35
Guys, I?m trying to have a machine use a custom bootenv (Want to install FreeBSD)

amontalban
2018-01-25 17:35
I have set the BootEnv for the machine, but for some reason it still loads sledgehammer

amontalban
2018-01-25 17:35
Any pointer?

amontalban
2018-01-25 17:44
NVM, I think I know what?s going on

lae
2018-01-25 18:16
@greg upon upgrading from 3.4.1 to 3.5.0 (and 3.6.0) I'm running into what seems like it might be the same issue with there being null values in our machine's metadata

shane
2018-01-25 18:17
hmm ... @lae do you have hand built content that this is occurring in ?

shane
2018-01-25 18:17
I haven't seen the issue w/ upgrades to v3.5.0 or v3.6.0 - with Digital Rebar or RackN content

shane
2018-01-25 18:18
I don't think I've had hand hacked content that I tried upgrading around ...


shane
2018-01-25 18:18
are you generating the machine objects in advance of machines showing up - or is this generated by DRP on new machines ?

lae
2018-01-25 18:18
I removed machine definitions and it starts up fine

lae
2018-01-25 18:18
if I re-add one of them, it fails to start

lae
2018-01-25 18:19
we're usually creating machine objects with drpcli

shane
2018-01-25 18:19
def. sounds similar to the `null` issue

shane
2018-01-25 18:19
you do _have_ to create them with a value for required fields, not let it `null` out

shane
2018-01-25 18:19
but - we shouldn't let you create a machine if a required field is missing ... ?

lae
2018-01-25 18:20
are Meta and Params supposed to be required fields?

lae
2018-01-25 18:20
not sure what Meta would be set to

greg
2018-01-25 18:20
Can you send me one? They are now, but the code should have migrated them.

lae
2018-01-25 18:20
There's one at the bottom of that paste I linked

greg
2018-01-25 18:20
Params and Meta can be ?{}? to start.

greg
2018-01-25 18:20
missed that.

greg
2018-01-25 18:27
Okay - I know what it is.

greg
2018-01-25 18:27
We thought we were already doing this.

lae
2018-01-25 18:33
Should I go ahead and string replace the null values with [] (by looks of it iI see null values in Meta/Errors/Params/Profiles spread across all of the machines) or just wait for an update that'll take care of validating/cleaning it up?

lae
2018-01-25 18:38
(gonna head to sleep since it's almost 4am, I'll check back later)

shane
2018-01-25 18:38
lazy bones ...

shane
2018-01-25 18:38
yes - string replace w/ `[]` should fix it

greg
2018-01-25 18:47
no - `{}`

greg
2018-01-25 18:47
well.

greg
2018-01-25 18:47
Meta, Params are `{}`

shane
2018-01-25 18:47
depends - array -vs- object

greg
2018-01-25 18:47
Errors would be `[]`

greg
2018-01-25 18:53
@lae - I?m fixing. It maybe a few hours.

amontalban
2018-01-25 19:13
Guys, how can I validate the generated pxelinux file for a machine?

amontalban
2018-01-25 19:14
(My drp-data/tftpboot/pxelinux.cfg is empty)

greg
2018-01-25 19:15
it will be. It is a virtual filesystem file.

greg
2018-01-25 19:16
you will need to curl them from their expected location

greg
2018-01-25 19:16
`http://<ip>:8091/pxelinux.cfg/default`

greg
2018-01-25 19:17
That would be the one served by discovery bootenv.

greg
2018-01-25 19:17
If the machine has the bootenv set, you can go to the expanded url from the template list in the bootenv.

amontalban
2018-01-25 19:17
Ah alright, thanks!

amontalban
2018-01-25 19:30
BTW, would be great to have the `memdisk` file from syslinux out of the box

greg
2018-01-25 19:31
Issue, please! @amontalban

amontalban
2018-01-25 19:31
Sure, I?m trying to boot FreeBSD so once I get that will do a PR if possible :slightly_smiling_face:

shane
2018-01-25 19:32
Or better: Pull Request !! :slightly_smiling_face:

greg
2018-01-25 19:36
@lae - I think I have a fix for you in tip. It will be there in about 30 minutes.

detiber
2018-01-25 20:51
@vlowther @greg I'm hitting some issues with pxe booting with uefi, and it looks like the situation is worst post 3.5 for me, since <= 3.5 my host would attempt to use binl on 4011, and > 3.5 it just keeps resending dhcp discover on 68

detiber
2018-01-25 20:52
I think something like the uefi workflow from https://github.com/google/netboot/blob/master/pixiecore/README.booting.md might be needed for hardware like mine

detiber
2018-01-25 20:54
As an aside, the api for the dhcp library used by pixiecore looks a lot cleaner than the one that is currently being used, but it isn't a drop in replacement :slightly_smiling_face:

greg
2018-01-25 20:56
Yeah. I thought about. You mean the krakow vs pxicore parts. I assume.

detiber
2018-01-25 20:56
@greg indeed, I started to try and take a hack at swapping out the libraries, but ended up in a rabbit hole that I wasn't prepared for :slightly_smiling_face:

greg
2018-01-25 20:58
I had the pixie UEFI path but victor tested it on our UEFi system it didn?t work and he altered to work in our lab. What hardware are. You running. This is really hard on a phone with autocorrect.

detiber
2018-01-25 20:59
I haven't fully diagnosed what is going on yet, but it appears that I have at least 2 boxes that work with pixiecore but not dr-provision. One is a LivaX (http://www.ecs.com.tw/ECSWebSite/Product/Product_LIVA.aspx?DetailID=1593&LanID=0) and the other is a frakenbox using this MB: https://www.asus.com/us/Motherboards/SABERTOOTH_X79/

greg
2018-01-25 21:01
Okay. Was 3.5 working?

detiber
2018-01-25 21:02
No, but 3.5 seemed to attempt to boot using binl on port 4011, but still failed. With newer builds it just keeps resending dhcp discover packets on 68

vlowther
2018-01-25 22:15
Hm.

vlowther
2018-01-25 22:16
Do you have a pxe stack that works with that gear?

vlowther
2018-01-25 22:17
It would be good to get a packet trace of all the ports involved.

vlowther
2018-01-25 22:19
I have also been working on a DHCP stack refactor, so that would be a good Branch to test with.

vlowther
2018-01-25 22:21
https://github.com/digitalrebar/provision/pull/649 would be a good branch to test with.

lae
2018-01-26 06:04
fix in tip appears to work - although I do still see the null values in the machine objects

lae
2018-01-26 06:04
@greg

greg
2018-01-26 14:14
Yeah it is fixed on load. As the machines get saved. They will change over time.

greg
2018-01-26 14:14
@lae

lae
2018-01-26 14:19
kk

amontalban
2018-01-26 16:51
Hey guys, why machines are indexed by random UUID and not by system UUID (Like the one inside goahi-inventory)?

shane
2018-01-26 16:52
not every operating system or hardware vendor provides a reliable UUID to use

shane
2018-01-26 16:52
we need to insure consistency across all hardware and operating system platforms

shane
2018-01-26 16:53
gohai inventory is showing you the hardware/bios generated system UUID

shane
2018-01-26 16:53
the Digital Rebar Provision ID is guaranteed to be unique across all hardware/OS types we come across

amontalban
2018-01-26 16:54
I see, thanks for the explanation :+1:

shane
2018-01-26 16:54
no problem

vlowther
2018-01-26 17:03
ya, I have seen cases where the system UUID is all zeros, or some crazy stuff like 1234-56-7-8901234, or "ToBe Filled In by O.E.M", among other things.

vlowther
2018-01-26 17:04
So for now we just don't rely on it.

amontalban
2018-01-26 17:12
Guys, don?t hate me but I?m getting a panic error, what information is needed for opening an issue besides the log itself? (It?s inside TFTP server)

amontalban
2018-01-26 17:19
Mmm, might be fixed by @vlowther here https://github.com/digitalrebar/provision/pull/662

vlowther
2018-01-26 17:20
No, that will just make the panic contain the information I need to debug it. :grinning:

amontalban
2018-01-26 17:21
Ah :slightly_smiling_face:

vlowther
2018-01-26 17:21
So if you are getting that panic after updating to the latest tip, pm me the panic message.

amontalban
2018-01-26 17:21
Awesome, thanks

amontalban
2018-01-26 17:22
First I have to check how to update to master

shane
2018-01-26 17:52
@amontalban we don't cut compiled releases against `master` - we have `tip` which sits _slightly_ behind master - we do release compiled versions for `tip`. Once we've done basic sanity checking and minimal testing - we set the Build system to point `tip` at a specific commit.

shane
2018-01-26 17:53
So ... until the Issue 662 is included in `tip` - you can't get binaries from us

shane
2018-01-26 17:53
...you can... if you are so enterprising enough ... compile your own binaries from `master` yourself

shane
2018-01-26 17:54

shane
2018-01-26 17:55
(you need to have Go 1.9.0 or newer setup and working first, etc... )

amontalban
2018-01-26 17:55
Awesome, thanks :+1:

vlowther
2018-01-26 18:30
Tip should have that PR now, BTW.

vlowther
2018-01-26 18:41
@amontalban Did that help?

amontalban
2018-01-26 18:45
Still setting up everything again

amontalban
2018-01-26 18:47
Ok, it crashed again. Should I reach you over PM?

vlowther
2018-01-26 19:01
PM me the stack trace.

detiber
2018-01-26 19:02
@vlowther pixiecore works on my hardware, but I'm trying to avoid writing the workflow stuff myself and would rather use dr-provision. I'll work on getting some packet traces together using both dr-provision and pixiecore later today

vlowther
2018-01-26 19:03
Cool. tcpdump in as much detail as you can, raw DHCP packets if you can get them would be appreciated.

vlowther
2018-01-26 19:04
The current tip is known to work on Dell T320 and R720 gear, and one of our customers has reported that recent HP and Supermicro gear works as well.

2018-01-26 19:04
Time to feed the :bear:!

wdennis
2018-01-26 20:04
@zehicle Having a problem with the UX's "Org. Name & Endpoints" screen; trying to remove a duplicate endpoint (I have two that are the same system for some reason) and when I click the Remove button on the one I want deleted, and then click the bottommost Save button, nothing happens...

wdennis
2018-01-26 20:05
(i.e. I can't delete the endpoint)

wdennis
2018-01-26 20:05
Tried with Safari 11.0.3, and Chrome 64.0.3282.x

wdennis
2018-01-26 20:12
Another UX issue...

wdennis
2018-01-26 20:13
When I log into my newly-upgraded DRP 3.6 system, I see the upgrade notif's as so:


wdennis
2018-01-26 20:14
But when I go to "Content" to upgrade them, the "Update" buttons are not available...


zehicle
2018-01-26 20:15
Checking.... @wdennis

zehicle
2018-01-26 20:20
@wdennis for the endpoint delete thing... you had a very early account with the old naming convention. should be fixed now

wdennis
2018-01-26 20:21
Oh, OK - I was trying to get rid of the 1st one...

wdennis
2018-01-26 20:23
@shane What is the new param I can use in a profile to specify a custom preseed template?

wdennis
2018-01-26 20:24
Not seeing anything when I drop the "Choose undefined..." box

greg
2018-01-26 20:24
Check the params in the UX. I think it is kickseed or something ilke that

wdennis
2018-01-26 20:25
@greg Does that come in with the newer `drp-community-content` that I can't seem to be able to update to?

greg
2018-01-26 20:26
yes - it is part of the community content update.

wdennis
2018-01-26 20:27
OK, that's why...

wdennis
2018-01-26 20:27
Is there a way to update the content via `drpcli`?

wdennis
2018-01-26 20:29
Is it `drpcli contents update [id] [json] [flags]`? If so, how to get remote content?

zehicle
2018-01-26 20:34
@wdennis I've duplicated the content screen issue - looking at the problem

wdennis
2018-01-26 20:46
@zehicle Thx

zehicle
2018-01-26 20:47
FWIW - this issue exposes that we can detect both minor (intra version) and major (extra version) changes. In this case, the content page says that no patches (intra) are needed and is overlooking the upgrade.

wdennis
2018-01-26 20:49
@zehicle But in this case, that's wrong, correct? There is a major update...

zehicle
2018-01-26 20:49
yes. working to add buttons for both cases

wdennis
2018-01-26 20:50
Also, verified that the endpoint save is working - renamed the one endpoint I had left, and it did save. However, no feedback when the "Save" button is clicked - could that be added somehow?

shane
2018-01-26 20:50
@wdennis - the Param you're looking for is indeed `kickseed` - it can be used interchangeably for Kickstart definitions or Preseed (hence it's munged name)

wdennis
2018-01-26 20:50
@shane thx, can't wait to use it!



wdennis
2018-01-26 20:51
(Actually, I do have a system to install, so hope I don't have to wait too long :wink: )

shane
2018-01-26 20:51
(and, I know about the spelling error in the note box ... )

wdennis
2018-01-26 20:52
Nice -- added `jq` usage examples! That's great

wdennis
2018-01-26 20:54
@shane one other thing -- I think it is now possible to "nest" templates, for instance a disk-partitioning template in the bigger kickseed template... Is that documented somewhere?

wdennis
2018-01-26 20:54
I think @lae pioneered that...

shane
2018-01-26 20:55
there is a constant stream of updates in the background, to the Latest doc :slightly_smiling_face:

shane
2018-01-26 20:55
I'm not sure if that's added to Doc yet - there were significant changes to the Architecture (data) docs - @vlowther worked up a lot of new info there

shane
2018-01-26 20:56
all of this stuff got a lot of updates: http://provision.readthedocs.io/en/latest/doc/arch.html

shane
2018-01-26 21:01
@wdennis I think this might be it? `{{template "something.tmpl" .}}` Will call another template named `something.tmpl` and expand it inline

shane
2018-01-26 21:01
that's the golang template pattern you'd use

wdennis
2018-01-26 21:04
OK, cool

greg
2018-01-26 21:07
yes - template or .CallTemplate can be nested.

wdennis
2018-01-26 21:10
I actually see this in the custom seed template I made some time ago:

wdennis
2018-01-26 21:11

wdennis
2018-01-26 21:20
@greg That's what you are talking about?

greg
2018-01-26 21:22
yes

greg
2018-01-26 21:23
`template` only takes a hard coded string. golang text template definition.

greg
2018-01-26 21:23
`.CallTemplate` takes something is a string. So, it can be a parameter or expression that evaluates into a string.

wdennis
2018-01-27 02:30
Looks like the param is actually named `select-kickseed` - I can see it in the params list when I "preview" the updated Community Content

wdennis
2018-01-27 02:31
So close, yet so far away...

greg
2018-01-27 17:22
- when people get a chance, can they PM me if they are using plugins?

kamp.scott
2018-01-27 17:24
Does KRIBs count

greg
2018-01-27 17:24
no - those are content packages.

greg
2018-01-27 17:25
this would be: incrementer, ipmi, packet-ipmi, virtualbox-ipmi, slack

greg
2018-01-27 17:25
or one you created. If you created a plugin, lord, please talk to me know. :slightly_smiling_face:

wdennis
2018-01-27 18:52
@greg ipmi (bare metal)

greg
2018-01-28 03:21
- tip has been updated with new plugins support.

greg
2018-01-28 03:22
THis means that you will need to update your plugins immediately after updating to tip.

greg
2018-01-28 03:22
I think this really only effects @wdennis

greg
2018-01-28 03:22
Just use the SaaS tip. I think.

wdennis
2018-01-28 03:23
@greg Not running tip, but stable - v3.6.0-0-0e5ccf678a3e5b5fdb10f86261247cd28c858ac0

greg
2018-01-28 03:23
I have not pushed this to a release.

wdennis
2018-01-28 03:24
Waiting for @zehicle to fix issue in UX

greg
2018-01-28 03:24
Okay - so you are fine. Don?t update plugins (to tip) without updating DRP.

wdennis
2018-01-28 03:25
ACK

zehicle
2018-01-28 03:30
working on it...

wdennis
2018-01-28 03:44
If there was another way to get updated content for v3.6.0, I wouldn?t be so worried about it - want to be able to use a custom kickseed to roll out machines; need the new param to do that?

wdennis
2018-01-28 03:45
So if there?s another way, let me know

shane
2018-01-28 18:20
@wdennis does this FAQ answer your question? It outlines how to use the CLI to download and apply Content upgrade... http://provision.readthedocs.io/en/latest/doc/faq-troubleshooting.html#update-community-content-via-command-line

zehicle
2018-01-29 05:41
Content upgrade by version is being tested internally. Hopefully available for advanced users tomorrow.

zehicle
2018-01-29 05:43
We'll talk about the new RackN UX stages in the community call on Tuesday. The short version is that we're moving to release stages where we do not roll new UX code into full production in a single step. https://portal.rackn.io will have the most stable version of the UX. New features will surface in https://latest.rackn.io for earlier testing.

wdennis
2018-01-29 14:48
Thx @zehicle!! Sounds like a better plan for UX changes.

wdennis
2018-01-29 14:50
@shane are you available?

shane
2018-01-29 14:50
what's up ?


shane
2018-01-29 14:51
fire away

wdennis
2018-01-29 14:52
1) I think there's a missing `drpcli` at the example show under "View our currently installed Content version:"

shane
2018-01-29 14:52
ah - yep - when I cut-n-pasted the command, I removed my Prompt string ... one too many "dw" commands :disappointed:

wdennis
2018-01-29 14:53
2) Why the `export VER=..."xxx"` before the `curl`?

shane
2018-01-29 14:53
when I do doc - I try to separate out "stuff" that is dynamic as a variable - it makes cut-n-paste of commands easier if you're following along, IMO

shane
2018-01-29 14:54
you can "set" the variable - and cut-n-paste the command w/out change

wdennis
2018-01-29 14:54
3) Under that, `No update the content.` --> `Now update the content.`

shane
2018-01-29 14:54
it also allows re-use across different use tests with tweaking the Var - also if it gets embedded in a script - those dynamic pieces often change, so you want to separate it as a Var/Param

wdennis
2018-01-29 14:55
So I could (should) do: `export VER="stable"` if I'm running v3.6.0 stable?

shane
2018-01-29 14:55
yep

wdennis
2018-01-29 14:56
Thx

shane
2018-01-29 14:56
we also have some catalog changes that emits all version info as well - that will be out soon

shane
2018-01-29 14:56
we'll talk about that tmw at meetup

wdennis
2018-01-29 14:57

shane
2018-01-29 14:57
yep :slightly_smiling_face:

wdennis
2018-01-29 14:58
OK, let's give it a try...

shane
2018-01-29 14:58
ok - let me know if any other changes needed - I have those changes staged and ready to push

wdennis
2018-01-29 15:03
When it says `NOTE that content that is marked Writable may need to be destroyed, and recreated if it?s currently in use on other objects. For Read Only content you can safely update the content.` - does that mean if the object attribute `ReadOnly:` is set to `false` then that == "content that is marked Writable"?

shane
2018-01-29 15:04
yep

wdennis
2018-01-29 15:05
So if I just try `drpcli contents update drp-community-content -< drp-cc.yaml` will it error out if one of the objects is currently in use by something? Of will it update?

shane
2018-01-29 15:06
Community Content is readonly - so you can safely update it - that is by design

shane
2018-01-29 15:06
:slightly_smiling_face:

shane
2018-01-29 15:06
The whole purpose to our layered filesystem model to have a layer as ReadOnly that can be safely updated

wdennis
2018-01-29 15:06
OK

shane
2018-01-29 15:07
I've amended the note to be a little more clear with that *field* value of `ReadOnly`

shane
2018-01-29 15:09
also remember - you can stop DRP, backup the `/var/lib/dr-provision` or `<wherever>/drp-data` directory, and restart - then apply content changes ... ultimately content packs are stored in the `saas-content` directory under each location ...

wdennis
2018-01-29 15:10
I'm doing the `drpcli contents update drp-community-content -< drp-cc.yaml` step but it's just hanging...

shane
2018-01-29 15:11
any messages in your dr-provision log ? that should return quickly - when I tested it saturday it only took a second or so

shane
2018-01-29 15:12
if you have an ISO upload or something happening at the same time - the content layer might be locked from writing (to disk) - so it'll block waiting for that to finish

wdennis
2018-01-29 15:13
This is last few lines of log:

shane
2018-01-29 15:13
so - I just ran those steps - and it ran smoothly for me

shane
2018-01-29 15:14
I'm using a stable DRP endpoint (v3.6.0)

wdennis
2018-01-29 15:14

shane
2018-01-29 15:14
with CC to v1.5.0 update

shane
2018-01-29 15:14
those are the basic annoying audit log messages

shane
2018-01-29 15:15
are you running drpcli from your endpoint - or using an admin laptop/system to point at remote endpoint ?

wdennis
2018-01-29 15:15
drpcli on endpoint

shane
2018-01-29 15:15
do you have RS_ENDPOINT variable set to something else ? or a different user/pass pair?

wdennis
2018-01-29 15:17
no, `RS_KEY` set to default , no `RS_ENDPOINT` set

wdennis
2018-01-29 15:19
OK; stopped/started DRP, and then did following:


shane
2018-01-29 15:20
there you go

shane
2018-01-29 15:20
that's success

wdennis
2018-01-29 15:21
Looks like prior attempt did update at least some things; ver was `"v1.1.0-17-7040582223c11766fcb741ac9436f17c486e271b"` before

shane
2018-01-29 15:21
did you try any other drpcli commands while it was "paused" (from another shell session, etc) ??

shane
2018-01-29 15:21
eg. was DRP responding to other commands ?

shane
2018-01-29 15:22
you'll need to update your sledghammer too (`drpcli bootenvs uploadiso sledgehammer`)

shane
2018-01-29 15:23
most of those warnings are just validation checks - since you don't have those BootEnvs installed, it'll chuck a Warning

wdennis
2018-01-29 15:23
Yes, drpcli was responding properly before I tried the update cmd

shane
2018-01-29 15:23
ok

wdennis
2018-01-29 15:23
After the failed update, though, it was not

wdennis
2018-01-29 15:23
That's why I stopped/restarted DRP (running isolated)

shane
2018-01-29 15:23
that was what I wanted to know

wdennis
2018-01-29 15:25
So there's a newer SH needed for v1.5.0 community-content?

shane
2018-01-29 15:25
yep - you'll see the Warnings related to SH - errors validating since it's missing

shane
2018-01-29 15:26
assuming you have defaultBootEnv/defaultStage/unknownBootEnv using sledgehammer pieces, those won't work until it's updated

wdennis
2018-01-29 15:26
In the UX Bootenvs screen, SH has the "blue check of happiness"

wdennis
2018-01-29 15:27
`sledgehammer-f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15.tar`

shane
2018-01-29 15:27
hmm - I would suggest updating sledgehammer, as we don't test new content w/ old sledgehammer - if content relies on changes in sledgehammer, that can cause unintended sideeffects

wdennis
2018-01-29 15:28
So, I have to d/l tar file, then do `drpcli bootenvs uploadiso sledgehammer`? Or does that cmd do the d/l?

shane
2018-01-29 15:28
if you have the f5ff ... I think that's latest - checking

wdennis
2018-01-29 15:28
Thx

shane
2018-01-29 15:29
your errors indicated you don't have the TAR ball for it, so it failed to explode it

shane
2018-01-29 15:29
``` "Explode ISO: iso does not exist: /isos/sledgehammer-f5ffd3ed10ba403ffff40c3621f1e31ada0c7e15.tar\n",```

shane
2018-01-29 15:30
did you explode it, then remove the tarball ? It'd be in the `tftpboot/isos/` directory

wdennis
2018-01-29 15:30
huhwat


shane
2018-01-29 15:32
huh - odd it'd chuck that warning then ...

shane
2018-01-29 15:32
I'll open an issue about that

wdennis
2018-01-29 15:32
OK, thx

shane
2018-01-29 15:33
everything else look good in the doc ? (I did fix another typo "brief example o how to" ... )

shane
2018-01-29 15:33
otherwise, I'll push the update

wdennis
2018-01-29 15:34
Don't see it in `v: tip` - is that where I can proof?

shane
2018-01-29 15:35
hmm - that might be old warnings from previously - versus newly generated warnings

shane
2018-01-29 15:35
I haven't pushed the change yet

shane
2018-01-29 15:35
it'd be in "latest" doc when I make the push - `tip` provision won't be updated until we move the commit pointer ...

shane
2018-01-29 15:36
I can push the branch, and you can review the branch on github before I merge it

wdennis
2018-01-29 15:36
OK


wdennis
2018-01-29 15:42
Not sure of syntax, but is there a space needed with the `-<` in ` $ drpcli contents update drp-community-content -< drp-cc.yaml`?

wdennis
2018-01-29 15:42
Like `- <`

shane
2018-01-29 15:43
nope not needed


wdennis
2018-01-29 15:46
OK, looks good to me then

wdennis
2018-01-29 15:50
@shane One more quick q: The value of `select-kickseed` should be a template ID, right? (such as, `necla-ubu-seed.tmpl`)

shane
2018-01-29 15:58
It's just a Parameter - so you can use it or set it wherever you want essentially. You can apply the Param directly to a machine: `drpcli machines set "09ae3ae4-095f-40e8-a544-6ac0aa336a30" param select-kickseed to "my-wondrous-kickstart.ks"` or add it to a Profile, and apply that profile to a Machine

shane
2018-01-29 15:59
ultimately - this can point to a template

shane
2018-01-29 15:59
(the value of the Param)

shane
2018-01-29 16:00
(yes, a Template name is the ID you'd use to reference)

wdennis
2018-01-29 16:01
Cool, thx

wdennis
2018-01-29 16:16
OK, so, problems...

wdennis
2018-01-29 16:17
Here's my machine object (target of custom kickseed install):


wdennis
2018-01-29 16:19
Seeing this at PXE install:


shane
2018-01-29 16:24
can you render the template/seed via the HTTP server ?

wdennis
2018-01-29 16:24
No

wdennis
2018-01-29 16:24
Comed back blank

shane
2018-01-29 16:24
ok - I can give you a hand in a bit - in a mtg now, and then I have to get some breakfast

wdennis
2018-01-29 16:24
K

wdennis
2018-01-29 16:29
Oh wait, maybe my bad...

wdennis
2018-01-29 16:29
Seeing this in logs:

wdennis
2018-01-29 16:30

wdennis
2018-01-29 16:34
Guess I named it wrong:

wdennis
2018-01-29 16:34
```[dradmin@dr-admin drp]$ drpcli templates show part-scheme-separate_home.tmpl { "Available": true, "Contents": "{{if .ParamExists \"operating-system-disk\" -}}\nd-i partman-auto/disk string /dev/{{.Param \"operating-system-disk\"}}\nd-i grub-installer/choose_bootdev select /dev/{{.Param \"operating-system-disk\"}}\nd-i grub-installer/bootdev string /dev/{{.Param \"operating-system-disk\"}}\n{{else -}}\nd-i partman-auto/disk string /dev/sda\nd-i grub-installer/choose_bootdev select /dev/sda\nd-i grub-installer/bootdev string /dev/sda\n{{end -}}\nd-i partman-auto/method string lvm\nd-i partman-auto-lvm/guided_size string max\nd-i partman-auto-lvm/new_vg_name string {{.Machine.ShortName}}\nd-i partman-auto/choose_recipe select custom_lvm\nd-i partman/auto expert_recipe string \\\n custom_lvm:: \\\n 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \\\n 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \\\n 10240 20 1228800 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ {{.Machine.ShortName}} } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \\\n 10240 100 10240000000 ext4 $lvmok{ } mountpoint{ /home } lv_name{ home } in_vg{ {{.Machine.ShortName}} } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \\\n 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ {{.Machine.ShortName}} } method{ swap } format{ } .\nd-i grub-installer/only_debian boolean true\n", "Description": "", "Errors": [], "ID": "part-scheme-separate_home.tmpl", "Meta": {}, "ReadOnly": false, "Validated": true }```

wdennis
2018-01-29 16:36
The default one is named `part-scheme-default.tmpl` so I guessed my custom one s/b named `part-scheme-XXXXX.tmpl`

wdennis
2018-01-29 16:37
So they need to be named `part-seed-XXXXX.tmpl` right?

wdennis
2018-01-29 16:53
Naming is hard...

greg
2018-01-29 17:13
For using the `select-kickseed` variable. It is the name of the template in total.

greg
2018-01-29 17:13
```{{$selectKickSeed := (printf "%s" (.Param "select-kickseed")) -}} {{.CallTemplate $selectKickSeed .}}```

greg
2018-01-29 17:15
For using the ubuntu/debian `part-scheme` variable, you need to have template that has the full name `part-scheme-<var>.tmpl` where `<var>` is the value of the the `part-scheme` variable. ``` {{$templateName := (printf "part-seed-%s.tmpl" (.Param "part-scheme")) -}} {{.CallTemplate $templateName .}} ```

greg
2018-01-29 17:15
@wdennis - okay?

wdennis
2018-01-29 17:33
OK now, got by that problem by cloning `part-scheme-separate_home.tmpl` to `part-seed-separate_home.tmpl`

wdennis
2018-01-29 17:34
But did you guys come up with the `{{$templateName := (printf "part-seed-%s.tmpl" (.Param "part-scheme")) -}}` line?

wdennis
2018-01-29 17:35
I would have guessed (and did!) that `printf "part-seed-%s.tmpl"` would have been `printf "part-scheme-%s.tmpl"` to fit with the pattern of the default one

shane
2018-01-29 17:41
the Nested Templates FAQ sections contains that notation



wdennis
2018-01-29 18:01
"RTFM" :joy:

wdennis
2018-01-29 18:02
Well, anyways, with that fixed, it works!

shane
2018-01-29 18:03
xclnt

wdennis
2018-01-29 18:03
However, now I have preseed problems :disappointed:

wdennis
2018-01-29 18:04
a) had `d-i passwd/make-user boolean false` in preseed, but installer stopped and made me set up local user...

wdennis
2018-01-29 18:05
b) my custom disk partitioning scheme didn't work as expected...

wdennis
2018-01-29 18:07
(I know these aren't DRP/RackN problems)

wdennis
2018-01-29 18:23
Here *is* a DRP problem, though:

wdennis
2018-01-29 18:24
I have the `ipmi` plugin set up and (was) working; I have been trying to powercycle a machine like so:


wdennis
2018-01-29 18:25
Output looks OK to me, but - no powercycle...

wdennis
2018-01-29 18:25
If I do it "manually" via:

greg
2018-01-29 18:26
What version of DRP? What version of ipmi plugin? If you are at DRP tip, then you need to update the plugin provider or it won?t get loaded.

wdennis
2018-01-29 18:26

wdennis
2018-01-29 18:26
it does work...

greg
2018-01-29 18:26
In the plugin view, does it show the ipmi plugin available?

wdennis
2018-01-29 18:27
On DRP 3.6 stable, I thought did NOT need to update plugins for that...

greg
2018-01-29 18:27
In the plugin-provider view, does it show ipmi installed?

greg
2018-01-29 18:27
You do not.

wdennis
2018-01-29 18:27
Yes, In UX "Plugins", `ipmi` is showing w/ blue checkmark

greg
2018-01-29 18:28
```drpcli machines action 00c933b9-b044-45e9-9c2e-d05abdd8c9c4 powercycle```

greg
2018-01-29 18:28
Shows you how to call it and what is required.

greg
2018-01-29 18:29
```drpcli machines runaction 00c933b9-b044-45e9-9c2e-d05abdd8c9c4 powercycle```

greg
2018-01-29 18:29
actually runs the action

wdennis
2018-01-29 18:32
And the student was enlightened

wdennis
2018-01-29 20:23
@zehicle More UX wonkiness: When I do an ipmi action on a host, it's throwing an error (but the impi does work):

wdennis
2018-01-29 20:24

vlowther
2018-01-29 20:25
welp, which machine wants the profile named ''?

wdennis
2018-01-29 20:33
@vlowther All my machines have a Profile block as so: ```"Profile": { "Available": false, "Description": "", "Errors": [], "Meta": {}, "Name": "", "Params": null, "ReadOnly": false, "Validated": false },```

greg
2018-01-29 20:34
yes - they do. The question is what do the machines Profiles array have in it.

wdennis
2018-01-29 20:34
Ah, let's get that then...

wdennis
2018-01-29 20:36
They all have something there... ```[dradmin@dr-admin drp]$ drpcli machines list | jq '.[].Profiles' [ "necla-ubuntu-default" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ] [ "k8s-cluster1" ]```

wdennis
2018-01-29 20:37
The one that I got the screenshot error on was the one with `necla-ubuntu-default`

greg
2018-01-29 20:37
Do those exist?

wdennis
2018-01-29 20:37
yes they do

greg
2018-01-29 20:38
```drpcli profiles show necla-ubuntu-default``` returns something?

wdennis
2018-01-29 20:39
Yes, it exists

wdennis
2018-01-29 20:40
It's my "normal params" bag

greg
2018-01-29 20:42
okay - cool - it looks like it maybe a ux bug

wdennis
2018-01-29 20:47
You folks know where's the best place to get help on preseed file settings? (specifically, `d-i partman/auto expert_recipe`)

vlowther
2018-01-29 20:48
I usually google for "debian preseed" and take my chances

wdennis
2018-01-29 20:48
Been there, done that :slightly_smiling_face:

vlowther
2018-01-29 20:48
yeah, that is a good as the docs get without reading you some Perl.

wdennis
2018-01-29 20:49
It's not taking my `/home` fs lvol spec... ```d-i partman/auto expert_recipe string \ custom_lvm:: \ 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \ 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \ 10240 20 1228800 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ testinstall } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 10240 20 10240000000 ext4 $lvmok{ } mountpoint{ /home } lv_name{ home } in_vg{ testinstall } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ testinstall } method{ swap } format{ } .```

vlowther
2018-01-29 20:50
yeah, I learned enough of that to get our default layout. That was years ago.

wdennis
2018-01-29 20:50
(and blissfully forgot it immediately thereafter :joy: )

vlowther
2018-01-29 20:51
Basically.

wdennis
2018-01-29 20:51
kickstart is _so_ much saner...

vlowther
2018-01-29 20:51
Though not with the urgency with which i forgot how to write Sendmail configuration files.

wdennis
2018-01-29 20:51
LOL

vlowther
2018-01-29 20:51
Before they had nicities like M4.

shane
2018-01-29 20:52
ewww ... sendmail .... I spent 2 weeks in classes with Eric Allman learning Sendmail from him ...

wdennis
2018-01-29 20:52
Well, looks like Debian IRC here I come...

vlowther
2018-01-29 20:53
Basically once I learned that the postfix mail server existed I promptly erased all traces of sendmail from my domain.

wdennis
2018-01-29 20:54
hear hear

shane
2018-01-29 20:56
the first ever open source commits I ever made were for Sendmail - to expand the queue from a single flat directory to a tiered N level structure for storing queued messages in ... since we were sending over 1 billiion messages a quarter via 100 node sendmail cluster .... we constantly locked the machines up on triple/quadruple lookups in a single directory entry

zehicle
2018-01-29 20:56
Updates to Content UX should be live.

wdennis
2018-01-29 21:09
@shane nice

wdennis
2018-01-29 21:09
gives @shane UNIX War Medal

2018-01-29 22:20
hey, just checking out Digital Rebar as a potential replacement for our exiting Foreman deployments. curious about the plugin system and how I can execute some post-provision actions, my google-fu is failing me and I can't find anything in the docs... can someone point me in the right?

2018-01-29 22:22
*right direction

shane
2018-01-29 22:22
well ... we're on the cusp of releasing an all-new plugin system that makes the current one completely obsolete ...

shane
2018-01-29 22:22
what are you trying to do specifically ? it may not require a plugin ... depending on what you want to do

2018-01-29 22:23
basically want to invoke a webhook on a remote system that we use for the rest of our automation

shane
2018-01-29 22:24
outgoing webhook during provisioning process ?

2018-01-29 22:24
tell that automation system 'hey a new physical node has just been provisioned, go do all the things, here is all the info you need'

shane
2018-01-29 22:25
you can do that fairly easily by adding in a Workflow Stage - which can trigger that webhook for you

shane
2018-01-29 22:25
that would (likely) be a post provisioning Stage, once you're all done, fire off the "phone home" sort of webhook

shane
2018-01-29 22:25
no need for plugin for that action

2018-01-29 22:26
yeah that's exactly what I'm looking for

2018-01-29 22:26
is there any examples of that? the o

2018-01-29 22:27
the only thing I can find about workflows in the docs is here http://provision.readthedocs.io/en/latest/doc/workflows.html

shane
2018-01-29 22:28
once you get Digital Rebar Provision up and running - there are several live examples you can look at ...

shane
2018-01-29 22:28
basically you'd write a Template that "does something" ... in "some language/script" you want on the provisioned OS ... you'd then add that as a Task to a Stage ...

shane
2018-01-29 22:29
that Stage becomes one of the last pieces in your Workflow

shane
2018-01-29 22:29
these are all keywords for you to take a look at our existing stuff in the Community Content

shane
2018-01-29 22:29
our doc isn't up to date w/ workflow and stages yet

2018-01-29 22:34
awesome I'll throw it on a VM tomorrow and check it out!

2018-01-29 22:34
thanks!

daniel.bernier
2018-01-29 22:47
@did the option 114 inside DRP but for some reason ONIE does not kick in for it

shane
2018-01-30 01:16
- Join us tomorrow (Tuesday Feb 1st) at 11:00 am PST for v010 of our Online Meetup topics: Versioned UX Endpoints, UX Content Versions, New Plugin System agenda: https://docs.google.com/document/d/1qe6ycKL2nJpNI9uJ0c1v5kyMWHXzfFyEm9d9x2Ptvfk



amontalban
2018-01-30 01:34
Nice, will try to join :+1:

zehicle
2018-01-30 02:13
note: @nmlaudy that the stages mean that the nodes can make the webcall. If you restrict the webcall to the admin network then a plugin would be the choice. Stages are a simple way to do it and a good starting point. We can setup a call to discuss options if you'd like. You can also request a slack account (http://rackn.com/support/slack).

nmaludy
2018-01-30 11:30
has joined #json

romain.lafontaine
2018-01-30 13:29
I'll try to stay as much as possible, interested in the plugin system and the 3.7.0 ^^

zehicle
2018-01-30 13:59
Welcome @nmaludy

nmaludy
2018-01-30 14:10
@zehicle thanks!

shane
2018-01-30 14:54
- today's meetup we'd like to collect use case info around the new Plugin System ... if you think you have any interest in extending Digital Rebar through the plugins - come talk to us today, we'd love your perspective ... (see meetup link above for joining details)

zehicle
2018-01-30 16:05
That includes the collect inventory use cases that have been coming up lately.

romain.lafontaine
2018-01-30 16:05
Just to be sure, what's the timezone attached to the meetup 11AM ?

shane
2018-01-30 16:13
11am PST

romain.lafontaine
2018-01-30 16:14
Damn...

zehicle
2018-01-30 16:46
If you join the meetup or , both will make sure you get the invites on your calendar https://www.meetup.com/digitalrebar

romain.lafontaine
2018-01-30 16:46
I'll, I was guessing that the meetup page shows local time... I'm Tetris-ing my calendar

zehicle
2018-01-30 18:57
loves Tetris as a verb! well done

lae
2018-01-30 19:49
ah I was wondering where this came from lol https://pbs.twimg.com/media/DU0BscnVQAE7yqP.jpg

lae
2018-01-30 19:52
actually never mind, I'm not in either :thinking_face:

wdennis
2018-01-30 19:59
All the things batteries have died... good meetup tho!

wdennis
2018-01-30 20:00
And sorry (not sorry :) I?m such a UX nudge

shane
2018-01-30 20:03
:slightly_smiling_face:

vlowther
2018-01-30 23:24
Sorry I missed it, but my kids have cleaner teeth now!

wdennis
2018-01-30 23:26
We missed you, epic in-mem fs layers discussion

wdennis
2018-01-31 00:57
Freekin? preseed `partman/auto expert_recipe` wrestling?

richard.burrows
2018-01-31 02:24
has joined #json

wdennis
2018-01-31 03:25
@greg You still around?

wdennis
2018-01-31 03:47
Actually, anyone that knows anything about Debian (Ubuntu) preeseed `partman/auto expert_recipe`? No matter what I?ve tried, I always get the same (wrong) result?

greg
2018-01-31 04:08
Not really. Can help some tomorrow maybe

wdennis
2018-01-31 04:10
OK, I?m out as well, defeated for the night? Tomorrow?s another day

wdennis
2018-01-31 17:55
And now it?s another day.. time for more preseed wrasslin?

wdennis
2018-01-31 19:36
Oh my lord, this is harder than it should be?

wdennis
2018-01-31 20:49
So, I have the following in my machine?s preseed:


wdennis
2018-01-31 20:50
However, when the install completes, I?m getting this?


wdennis
2018-01-31 20:53
Why u make no /home LV???????

wdennis
2018-01-31 20:54
Anyone out there in DRP with a clue, I?m buying?

greg
2018-01-31 21:04
@wdennis - I think your root partition is min size 10G and max size 100G. with a priority of 102401. Your home drive is min size 10G and max size lots priority 102433. i think that means it will make the root full size and then have no room for /home.

greg
2018-01-31 21:04
It appears you have 100GB drive.

greg
2018-01-31 21:04
You may want to try and change 102400 on the / part to 51200 and see if you get a home drive.

greg
2018-01-31 21:04
Just for a test.

wdennis
2018-01-31 21:05
It?s a 1TB (1000GB) drive?

greg
2018-01-31 21:05
hmm

greg
2018-01-31 21:06
not sure.

greg
2018-01-31 21:06
but I just read the priority should be between the min and max

wdennis
2018-01-31 21:06
Yes, it?s basically a weighting value from what I understand

wdennis
2018-01-31 21:07
I have no idea where to get help on this? #debian in IRC yielded nothing?

greg
2018-01-31 21:07
I?d try putting the 102401 priority to 102399.

wdennis
2018-01-31 21:08
There?s random blog posts, etc but none with any magic for me

vlowther
2018-01-31 21:08
da

vlowther
2018-01-31 21:09
That is the problem with preseeds

vlowther
2018-01-31 21:09
Horrible docs and horrible perl when the docs are not good enough.

wdennis
2018-01-31 21:09
I made a test server, installed by hand, with a correct partitioning, so I know it can be done; just not know how via preseed?

wdennis
2018-01-31 21:10
Does anyone know if Ubuntu changed the Debian preseed logic, or if they stick strictly to Debian?s?

wdennis
2018-01-31 21:10
(That would help with where to ask?)

greg
2018-01-31 21:12
not sure. You may also need to make sure root is first and then home

greg
2018-01-31 21:12
in the list.

wdennis
2018-01-31 21:13
I did play with that? It was that way, then I thought to flip them and see if that did anything different

wdennis
2018-01-31 21:14
I?ve done like 10 installs on the server w/ different partman params?

wdennis
2018-01-31 21:15
I?m thinking of when I (ever) understand this and get it, then it would be cool to have a community library (content pack) of these disk partitioning layouts (both preseed and kickstart) to choose from

wdennis
2018-01-31 21:21
@vlowther Do you know where to read the source code that handles the preseed processing?

wdennis
2018-01-31 21:22
(yes, I?ve sunk to that level :disappointed:)

vlowther
2018-01-31 21:36
uh

vlowther
2018-01-31 21:36
Spread out across the repos in https://anonscm.debian.org/cgit/d-i/

vlowther
2018-01-31 21:36
$DEITY help us all

wdennis
2018-01-31 21:41
Where is your $DEITY now???

vlowther
2018-01-31 21:42
That is some heavy theology that we should probably avoid. :slightly_smiling_face:

wdennis
2018-01-31 21:50
Well, all I know is I?m in $PARTMAN_HELL :rage::imp::tired_face:

wdennis
2018-02-01 01:14
OK, out of desperation, posted a question on Ubuntu Launchpad... https://answers.launchpad.net/ubuntu/+question/663937

wdennis
2018-02-01 01:15
Beginning to think the LVM partitioner only supports root and swap LVs

shane
2018-02-01 01:44
wonders if chucking partman out the window is a better idea and just using `parted`, or other tools as part of a `d-i preseed/late_command string...` method

zehicle
2018-02-01 02:46
w00t new UX feature rolling through testing allows you to set Machine icon & color

wdennis
2018-02-01 06:45
Now investigating this: https://launchpad.net/kickseed

lae
2018-02-01 06:54
kickstart to preseed has several limitations

lae
2018-02-01 06:54
last i looked

lae
2018-02-01 06:55
also i'll take a closer look at your preseed later, heading out atm

greg
2018-02-02 05:00
- tip has been update with fixes for the deadlock some have been seeing. ALso, DRP will point to the stable UX by default. This includes more unit tests for DHCP.

greg
2018-02-02 05:00
We are getting close to 3.7.0

wdennis
2018-02-02 17:44
OK, victory in preseeding appears within reach! :the_horns:

wdennis
2018-02-02 17:44
But MacGyver would be proud...

ctrees
2018-02-02 17:45
ONLY if it blew up your test lab as you escape with a paper clip... :wink:

wdennis
2018-02-02 17:45
A) Ubuntu has a subsystem called "kickseed" that can take a kickstart file from the PXE command line, and auto-magically translate it into preseed...

wdennis
2018-02-02 17:48
So, from https://code.launchpad.net/~ubuntu-installer/kickseed/master get the code, and go into `~ubuntu-installer/kickseed/master`

wdennis
2018-02-02 17:49
B) Copy in a "Ubuntu-compatible" kickstart file (Ubuntu only supports a subset of kickstart) such as:

wdennis
2018-02-02 17:54

wdennis
2018-02-02 17:55
C) Then, run `./test-kickseed <kickstart_file>`

wdennis
2018-02-02 17:56
D) Take what you need from the preseed file which is generated & output to the screen

wdennis
2018-02-02 17:57
E) Profit!

ctrees
2018-02-02 18:00
Oh... so you get detailed output of all the script-expansion running the https://bazaar.launchpad.net/~ubuntu-installer/kickseed/master/view/head:/test-kickseed

ctrees
2018-02-02 18:11
I THINK I follow... so you've MacGyver'd a kickseed lint for ubuntu so you can pass a 'clean' (no additional expansion) to drp (really sledgehammer) cause it is CentoOS based and does not expand as expected ? (Note: doing as a mental exercise as I need to figure out kickseed execution for an embedded system) ? ... Check my math... pretty COOL if I'm tracking correctly...

wdennis
2018-02-02 18:13
No. It's just a means to an end to generate preseed directives (basically, the `partman` disk-partitioning ones) from a kicstart file, which I understand better, and IMNSHO has a WAY saner config language (especially for disk partitioning!)

wdennis
2018-02-02 18:15
So from this kickstart config section:


wdennis
2018-02-02 18:16
I got this preseed section:


wdennis
2018-02-02 18:18
And, it works!


ctrees
2018-02-02 18:24
So the MacGyver was a 'preseed template creation' not a linter... or more of the 'collect underpants' stage

wdennis
2018-02-02 18:25
It was a "I know how to write the partitioning I want in kickstart, but can't figure out how to do the same in preseed" rosetta stone :slightly_smiling_face:

wdennis
2018-02-02 18:27
Now, the kewl thing with DRP templating is, @greg (or @vlowther maybe?) wrote support in for templates-in-templates (which Go templates don't have native support for, amirite?)

wdennis
2018-02-02 18:28
So now I can have a library of disk partition templates that I can "plug in" to my standard preseed template for Ubuntu

ctrees
2018-02-02 18:30
I know I have to do something like what you've pulled off as I've got a bunch of old HPE gear that I'll need to avoid the RAID stuff with... AND hope to get to use drp on some embedded f/w support stuff... have not done in the mud kickstart in decades... esp all the part stuff... glad your shoveled a path :wink:

shane
2018-02-02 18:36
@wdennis - yes @greg added the ability to have Nested templates, which are not native to Golang Templating ... and collecting a bunch of example partitioning schemes and allowing to select the right Nested Template based on a Param input would be a very very nice thing to have

wdennis
2018-02-02 18:40
Right now for testing, I am setting the `part-scheme` and `select-kickseed` params on the host itself, but in the future, I'll probably set them in the Profile that the hosts are set to

andreas.holmsten
2018-02-06 14:59
@wdennis I've also had some issues with the preseed and disk partitioning. Seems like some preseed options in the default template (btw part-scheme-default.tmpl got incorrect options) are in the wrong order which cause partitioning to not work correctly. Took a couple of hours for me to track it down but not had time to make a pull request yet

wdennis
2018-02-06 15:35
@andreas.holmsten Thx for info; want to share your findings in the meantime?

wdennis
2018-02-06 15:37
I'm slowly piecing together (trial/error due to poor docu) a partitioning sub-template that does /boot, /boot/efi, and then rest of disk for LVM PV that then gets split into multiple LV's (in one VG)

wdennis
2018-02-06 16:50
My as-of-now preseed partitioning is thus:


wdennis
2018-02-06 16:52
Funny thing is, is does the right thing on a HDD of 1TB or more; but on >1TB (tried on 500GB & 250GB) it just makes LV's for swap and root, no /home...)

wdennis
2018-02-06 16:57
I think the weight values are funky, but they are exactly what the "kickseed" tool produced

wdennis
2018-02-06 19:48
So, what's the correct syntax to spec a couple of "Params" values on a machine? Did this, failed: ```$ drpcli machines update 985a9585-1923-491d-b813-1070a3c11f51 '"Params": { "part-scheme": "separate_home-TEST", "select-kickseed": "necla-ubu-seed.tmpl" }' Error: Failed to generate changed machines:985a9585-1923-491d-b813-1070a3c11f51 object: invalid character ':' after top-level value```

greg
2018-02-06 19:52
use the get or set cli commands

greg
2018-02-06 19:53
`drpcli machines set <uuid> param <param-name> to <value>`

wdennis
2018-02-06 20:15

wdennis
2018-02-06 20:15
Thx @greg

wdennis
2018-02-06 20:18
No way to set multiple with the same command?

greg
2018-02-06 20:19
the update you were doing is the way.

greg
2018-02-06 20:19
```drpcli machines update 985a9585-1923-491d-b813-1070a3c11f51 '"{ Params": { "part-scheme": "separate_home-TEST", "select-kickseed": "necla-ubu-seed.tmpl" } }'```

greg
2018-02-06 20:20
Note the extra {}

greg
2018-02-06 20:20
To remove parameters, you have to use the remove subcommand.

wdennis
2018-02-06 20:29
Ah, OK

wdennis
2018-02-06 20:29
`remove` instead of `update`, or instead of `set`?

wdennis
2018-02-06 20:46
OK, it's like `$ drpcli machines remove 985a9585-1923-491d-b813-1070a3c11f51 param "part-scheme"`

wdennis
2018-02-06 20:47
Next issue: using the default DRP preseed partitioning, as so:


wdennis
2018-02-06 20:49
But getting this when the install hits the partitioning step:


wdennis
2018-02-06 21:42
^^^ anyone?

greg
2018-02-06 21:47
umm - start with checking the machine?s aggregate parameters to see what is going on to make sure all is unset.

wdennis
2018-02-06 22:00
@greg You mean this?

wdennis
2018-02-06 22:00

wdennis
2018-02-06 22:01
The `"Params:"` section?

greg
2018-02-06 22:02
if you add `--aggregate` it will include parameters.

greg
2018-02-06 22:03
It appears you are using your own preseed.

wdennis
2018-02-06 22:04
Yes, but pulls in DRP default partitioning (`part-scheme-default.tmpl`)

wdennis
2018-02-06 22:05
Doing it wrong, I guess: ```$ drpcli machines show 985a9585-1923-491d-b813-1070a3c11f51 --aggregate | jq 'del(.Params."gohai-inventory")' Error: unknown flag: --aggregate Usage: drpcli machines show [id] [flags]```

greg
2018-02-06 22:05
sorry

greg
2018-02-06 22:06
`drpcli machines params <uuid> --aggregate`

greg
2018-02-06 22:06
it restricts to just parameters.

greg
2018-02-06 22:06
One tests would be unset kickseed and see what happens.


wdennis
2018-02-06 22:24
OK, trying one machine with stock DRP preseed, we'll see what happens...

wdennis
2018-02-06 22:26
So, when I create/edit my own templates, where do they live in the filesystem? In `saas-content`?

wdennis
2018-02-06 22:27
Because from now on, I'm version-controlling the hell out of them...

greg
2018-02-06 22:28
They are in the writable store. That is why I in general don?t use clone actions, but create my own content bundle.

wdennis
2018-02-06 22:28
Where is that?

greg
2018-02-06 22:30
I create a directory, git init, throw in some files and then use `drpcli contents bundle`

greg
2018-02-06 22:30
to build a content bundle that I upload.

wdennis
2018-02-06 22:46
Sounds like the way to go...

wdennis
2018-02-06 22:47
Any docu on that process?

zehicle
2018-02-06 22:47
@wdennis we are creating a video for this. Our first attempt was pretty close, but needs to be updated for sound quality. https://youtu.be/yy7-2D4jXXg

wdennis
2018-02-06 22:47
Will check out... thx @zehicle

wdennis
2018-02-06 23:01
OK, booted a new machine, only changed the name & profile, NO custom preseed/partitioning set, still getting the "No root file system is defined" error...

wdennis
2018-02-06 23:03

wdennis
2018-02-06 23:04

wdennis
2018-02-06 23:05
Can anyone see any problems in the generated preseed? Should be DRP standard...

greg
2018-02-06 23:28
You could try to boot into sledgehammer and check to see if the disk is already partitioned, if so, wipe it (like in the erase-disk task), and try. See if the LVM pre-existing is getting in the way.

wdennis
2018-02-06 23:32
I actually did do that (the disk was used, did have pre-existing LVM; I did a `vgremove` then `pvremove` and thereafter `dd if=/dev/zero of=/dev/sda bs=512 count=1` to wipe MBR + part tbl

wdennis
2018-02-06 23:33
So to DRP should look like a blank disk.

greg
2018-02-06 23:34
it isn?t DRP - it is ubuntu.

wdennis
2018-02-06 23:34
You are right

wdennis
2018-02-06 23:35
But anyways.

greg
2018-02-06 23:38
You may need to add the erase-hard-disks-for-os-install to your flow.

greg
2018-02-06 23:39
it does this: ``` #!/bin/bash # Nuke it all. declare vg pv maj min blocks name # Make sure that the kernel knows about all the partitions for bd in /sys/block/sd*; do [[ -b /dev/${bd##*/} ]] || continue partprobe "/dev/${bd##*/}" || : done # Zap any volume groups that may be lying around. vgscan --ignorelockingfailure -P while read vg; do vgremove -f "$vg" || : done < <(vgs --noheadings -o vg_name) # Wipe out any LVM metadata that the kernel may have detected. pvscan --ignorelockingfailure while read pv; do pvremove -f -y "$pv" || : done < <(pvs --noheadings -o pv_name) # Now zap any partitions along with any RAID metadata that may exist. while read maj min blocks name; do [[ -b /dev/$name && -w /dev/$name && $name != name ]] || continue [[ $name = loop* ]] && continue [[ $name = dm* ]] && continue [[ $name = fd* ]] && continue mdadm --misc --zero-superblock --force /dev/$name || : if (( blocks >= 2048)); then dd "if=/dev/zero" "of=/dev/$name" "bs=512" "count=2048" dd "if=/dev/zero" "of=/dev/$name" "bs=512" "count=2048" "seek=$(($blocks - 2048))" else dd "if=/dev/zero" "of=/dev/$name" "bs=512" "count=$blocks" fi done < <(tac /proc/partitions) ```

greg
2018-02-06 23:39
We found you also have to blast the end of the disk.

wdennis
2018-02-06 23:39
Is that a new stage?

greg
2018-02-06 23:40
it is a task that can be added to stage.

shane
2018-02-06 23:40
LVM leaves nasty poo all over the place and is a nightmare to get rid of ...

wdennis
2018-02-06 23:40
Yup

wdennis
2018-02-06 23:40
But I thought vgremove/pvremove would get rid of it...

shane
2018-02-06 23:40
nope

greg
2018-02-06 23:40
nope.

wdennis
2018-02-06 23:41
And the "nope"s have it!

wdennis
2018-02-06 23:42
Why does it work tho when I re-install one of my DRP-installed hosts? B/c it has same LVM structure?

greg
2018-02-06 23:43
probably

greg
2018-02-06 23:43
I?ll probably need to post a new flow.

greg
2018-02-06 23:43
at some point that uses stage-chooser and a pre-stage to wipe the disk, but that is later.

wdennis
2018-02-06 23:56
Booted the non-installing node with sledgehammer, and doing a `dd if=/dev/zero of=/dev/sda bs=1M` to wipe the disk..

wdennis
2018-02-06 23:56
Of course, it's a 2TB disk, so that'll run for a while...

wdennis
2018-02-06 23:57
We'll see what I get with the normal install thereafter

andreas.holmsten
2018-02-07 10:00
@wdennis the no root file system is exactly what i had to troubleshoot as well. First off `d-i partman/auto expert_recipe string` in the partitioning scheme is incorrect syntax. It should be `d-i partman-auto/expert_recipe string`. Secondly I moved: ``` d-i partman/confirm_write_new_label boolean true d-i partman/choose_partition select finish d-i partman/confirm boolean true d-i partman/confirm_nooverwrite boolean true ``` to after partitioning scheme. Otherwise the installer will error on no root partition found (as there isnt one yet made)

andreas.holmsten
2018-02-07 10:03
The single reason the default preseed and partitioning work is because of the incorrect syntax. Since no manual scheme is selected the default partman atomic scheme is

andreas.holmsten
2018-02-07 10:21

andreas.holmsten
2018-02-07 10:24
Observe that I'm not really that good with preseed so my assumptions might be wrong but it's what resolved the issue for me

greg
2018-02-07 12:48
@andreas.holmsten seems expert to me :grinning:

greg
2018-02-07 12:48
I?ll review and pull it in

2018-02-07 13:46
hi all - currently kicking the tyres to see if Rebar fits my use-case - looking good so far! I did run into this whilst attempting to pxe my first victim: "[0:1]TFTP: lpxelinux.0: transfer error: sending block 0" and the issue error seems to come from here https://github.com/digitalrebar/provision/blob/master/midlayer/tftp.go#L68. unfortunately i'm not familiar with Go and i'm unsure as to why that error has cropped up - can it just not find lpxelinux.0?

wdennis
2018-02-07 13:55
@andreas.holmsten Thanks - together maybe we can get a good base partitioning template, and then (my hope) maybe collaborate on community partitioning templates

wdennis
2018-02-07 13:59
Also @greg / @vlowther - maybe a good idea to put all of the partitioning (`d-i partman*`) directives into the base partitioning template - right now they are spread out over the preseed template and the partitioning template

shane
2018-02-07 14:02
@analbeard we're in a meeting - but check that your DRP Endpoint doesn't have an asymmetric routing issue. You might try to add --static flag to DRP start up, with the IP address from the interface on the provisioning machine side

2018-02-07 14:03
thanks Shane, that's certainly a possibility - the environment might need a little more work first

wdennis
2018-02-07 14:09
Hey RackN folk - at an Ansible training today, they are still referencing Cobbler on their slide deck:


wdennis
2018-02-07 14:25
Interesting - they keep mentioning Cobbler in their deck - strange for a dying OSS project to get such RHAT mentions?

wdennis
2018-02-07 14:25
@wdennis uploaded a file: https://rackn.slack.com/files/U416T0AAX/F96BHSF9V/cobbler-commits.png and commented: No commits since 1st week of Oct?17?

spector
2018-02-07 14:26
The project is essentially ?dead?. They release about 1 or 2 a year but it isn?t active at all

wdennis
2018-02-07 14:27
May be an oppt?y to reach out to RHAT folks and pitch your product (I?m sure you?ve probably thought of that already :slightly_smiling_face: )

spector
2018-02-07 14:28
Yup, next time you see this go ahead and raise the hand and tell them all about Digital Rebar and to get with it!!!! We will of course reach out

vlowther
2018-02-07 15:17
@analbeard -- Those are expected at a rate of about 1 per boot, it is the nic firmware initiating a download to get the size and then aborting the transfer.

vlowther
2018-02-07 15:18
It then pulls it for real after that.

vlowther
2018-02-07 15:19
Downgrading that log message to info priority has been on my list of things to do.

vlowther
2018-02-07 15:21
As long as the systems proceed to PXE boot, you can disregard it for now.

wdennis
2018-02-07 16:01
Ok, I dutifully harangued the lead RHAT guy in the training about how Cobbler must die, and oh by the way, have you heard of RackN? :grin:

wdennis
2018-02-07 16:04
he said he doesn?t see a lot of Cobbler out on the consulting gigs; mostly RedHat Satellite (upstream = Foreman)

gbuehler
2018-02-07 16:54
has joined #json

shane
2018-02-07 17:15
@gbuehler $welcome

2018-02-07 17:15
Digital Rebar community welcome information is here > http://rebar.digital/community/welcome.html

skluss
2018-02-07 17:38
has joined #json

jschulthies
2018-02-07 17:39
has joined #json

wdennis
2018-02-07 19:52
@greg Any way to register DRP events when a node PXE's and pulls pxelinux.0, then the kernel & initrd, so we can see those in the Event Log?

vlowther
2018-02-07 20:19
Not at this time.

wdennis
2018-02-07 20:43
Roadmap?

wdennis
2018-02-07 20:49
Would be great to know when the actual node installer boot happens?

vlowther
2018-02-07 21:12
Open an issue, or else I will forget by the end of the day. :grinning:


wdennis
2018-02-08 03:33

2018-02-08 13:41
afternoon all! i'm still poking DRP - maybe a little further along the road now. I've done a fairly vanilla install, bound DRP to the interface I want to use (usign --static-ip), but I'm still unable to get any of the files in the tftpboot dir which obviously means the boot fails. I've tried connecting with a tftp client but that also fails with 'transfer timed out'

2018-02-08 13:41
any suggestion as to where i'm going wrong here?

greg
2018-02-08 13:58
Firewall rules?

2018-02-08 13:59
yup i just clocked that about two minutes ago

2018-02-08 14:00
palm has been vigorously applied to face

greg
2018-02-08 14:00
:grinning:

2018-02-08 14:01
someone should bust me back down to first line for that

2018-02-08 14:10
possibly a daft question, but would the serial console show anything when the victim has booted into the sledgehammer env? i can see it in the machine list so it's been successful, but there's nada on the console

greg
2018-02-08 14:11
It will depend upon your hardware / env. We don?t pass a serial console, so it is linux defaults.

greg
2018-02-08 14:11
For packet, we add profiles that set the `kernel-console` parameter

greg
2018-02-08 14:12
To handle this case.

greg
2018-02-08 14:12
You can set that globally or on a machine or a profile to a machine.

greg
2018-02-08 14:12
Checkout the parameter `kernel-console` in the UX.

2018-02-08 14:12
ok, that makes sense. it's not the end of the world, just nice to see some output to understand what's going on, especially whilst i'm just poking it. if we were to use it in production then it wouldn't matter

2018-02-08 14:12
ok, will do. thanks!

greg
2018-02-08 14:13
make sense. Also, if you setup the `access-keys` parameter (docs has some stuff on this), you can ssh in to the box as well.

greg
2018-02-08 14:13
@faq


greg
2018-02-08 14:14
in faq - 22.3


2018-02-08 14:15
ah yes, that looks handy. thanks Greg!

greg
2018-02-08 14:15
You can do the command in the doc to the `global` profile and it will be available to all machines all times.

2018-02-08 14:20
ah yes i see what you mean, i can see a key for 'galthaus@Gregs-MacBook-Pro.local' in the root-access-example profile ;)

greg
2018-02-08 14:21
well - you know. author privs and all

greg
2018-02-08 14:22
The main thing to remember is that the parameter has to be set on the machine (globally, or specifically, or by profile assignment) when the task `ssh-access` runs during the discover stage. That means for discovered things you have to have it globally set. Or set it after discovery and reboot.

2018-02-08 14:25
sure, that makes sense. I think i've probably got enough to get something going now

2018-02-08 14:25
thanks again!

2018-02-08 15:08
hmm, I've had a machine PXE booted for half an hour or so but i'm unable to progress to an install: `Can not change bootenv while in a stage unless forced. old: sledgehammer new ubuntu-16.04-install`

2018-02-08 15:08
the machine is currently sitting in the discover stage according to it's info

greg
2018-02-08 15:24
This means you have a task that didn?t complete I think.

greg
2018-02-08 15:25
Check the jobs area to see if there is a failed job. The machine has probably been marked not runnable as well.

lae
2018-02-09 07:48
```[lae@yuzu fireeye-content]$ drpcli profiles update global global.yaml Error: Failed to generate changed profiles:global object: invalid character '-' in numeric literal```

lae
2018-02-09 07:48
I'm getting this I guess after a recent update, did anything change regarding importing profiles from yaml?

lae
2018-02-09 07:49
kind of expected this to also work, but I guess it's not expecting stdin to be yaml anymore? ``` [lae@yuzu fireeye-content]$ drpcli profiles show global -F yaml > tmp.yml [lae@yuzu fireeye-content]$ drpcli profiles update global - < tmp.yml Error: Failed to generate changed profiles:global object: invalid character 'A' looking for beginning of value ```

2018-02-09 09:40
@lae I had that yesterday, I think it's mean to be JSON

lae
2018-02-09 09:42
yes, but yaml used to be allowed

lae
2018-02-09 09:43
I'm just reusing my existing workflow for updating the global profile, which seems to not support yaml anymore

lae
2018-02-09 09:44
(I just went ahead and exported/edited/reimported as json for my immediate need but that's more tedious than editing yaml and committing it to git)

2018-02-09 09:45
@greg (or anyone at RackN) - is there no way to view the UI than through the RackN website? The environment my DRP box will live in won't be internet accessible, and even if it were I can guarantee our security team would have a shit-fit if I suggested doing that

2018-02-09 09:46
@lae I have found some of the docs to be a little out of date and I had to fudge my way around it - editing profiles was oen of them

lae
2018-02-09 09:47
the rackn UI doesn't access your DRP instance over the internet, it uses JS in your browser to access it - so you just need to be able to access the DRP endpoint from your browser

2018-02-09 09:47
oh - i hadn't investigated it because I thought that was how it worked. that certainly improves things!

lae
2018-02-09 09:52
and uh, my point is that this seems to be an unexpected regression in one of the recent releases. Anyway, I just tried downgrading drpcli to 3.4.1 and 3.2.1 (from 3.6.0) and that works in 3.2.1 but not 3.4.1

zehicle
2018-02-09 14:12
@analbeard, yes. That's a commercial offering of the ux.

zehicle
2018-02-09 14:14
We call that "air gap" but @lae is right. The ux does not require firewall holes because it uses CORS multi site.

vlowther
2018-02-09 14:37
@lae broken yaml support is definitely a bug. Open an issue?

greg
2018-02-09 14:46
@lae I opened an issue on that with the workaround for now. I was in meetings and need to look at it

greg
2018-02-09 14:47
Well the profile failing on redirect is more than what I was seeing.

2018-02-10 19:28
Just reinstalled with force. When I go to setup subnets in UX I just get a spinning "Loading Interfaces". Any suggestions? I am a newbie at drp.

2018-02-10 19:29
Running v3.6.0

shane
2018-02-10 19:43
@MattyBoy4444 - are you sure your DRP Endpoint is accessible from your Laptop/Management machine - no Firewalls or IPTables rules blocking access? You can also check the process to make sure it's running on the Endpoint as well (`ps -ef | grep dr-provision`) ... also - any log output from the running instance if it is running ?

shane
2018-02-10 19:44
you need TCP Port 8092 access to the DRP Endpoint from the system you are running the Web Browser connection to the Portal

2018-02-10 20:15
@rackneng I can access the UX frontend from my management machine. I did check Chrome console and found a jquery warning

2018-02-10 20:15
@rackneng jquery.min.js:2 jQuery.Deferred exception: Cannot read property 'push' of undefined TypeError: Cannot read property 'push' of undefined at https://rackn.github.io/provision-ux/build.js:14099:48 at Array.map (<anonymous>) at https://rackn.github.io/provision-ux/build.js:14076:38 at l (https://rackn.github.io/provision-ux/vendor.js:93588:443) at uu (https://rackn.github.io/provision-ux/vendor.js:93649:252) at Function.On.flatMap (https://rackn.github.io/provision-ux/vendor.js:93681:101) at Object.<anonymous> (https://rackn.github.io/provision-ux/build.js:14073:36) at j (https://rackn.github.io/provision-ux/vendor.js:93579:29999) at k (https://rackn.github.io/provision-ux/vendor.js:93579:30313) undefined


greg
2018-02-10 20:25
You may want to try changing the base url to http://portal.rackn.io

2018-02-10 20:27
@rackneng the log inside ux is basically empty

graziee
2018-02-11 15:32
has joined #json

2018-02-11 17:09
@rackneng I am running tip BTW. Sure looks like a bug due to the following changeset: https://github.com/digitalrebar/provision/commit/00f5ac97b8fab08353f03d068eab96948b706581

allen.swackhamer
2018-02-11 18:29
has joined #json

greg
2018-02-11 20:30
@MattyBoy4444 - the image you sent us - shows us that you are running v3.6.0 and a test UX.

greg
2018-02-11 20:31
You could try to change: `http://rackn.github.io` to `http://portal.rackn.io` and see if it loads differently.

greg
2018-02-11 20:32
The commit reference you made wouldn?t make a ux hang, because it doesn?t change API output. It would change internal actions. It also makes less thing required.

2018-02-11 22:16
@rackneng Well I installed from this URL. https://github.com/digitalrebar/provision/releases/download/tip/dr-provision.zip. Also the error is in reference the "NextServer" missing, which is what that changeset removes. My 2 cents.

greg
2018-02-11 22:20
Hmm - okay The image you sent doesn?t align with that, but probably true.

2018-02-11 22:22
@rackneng If you dig into Build.js on that line, I think it references o.NextServer

greg
2018-02-11 22:22
Yeah - I?m looking at it. The UX is not handling the facct that NextServer can be unset.

greg
2018-02-11 22:28
Actually, it is just a UX bug.

2018-02-11 22:28
That is what I thought.

greg
2018-02-11 22:28
I still don;?t think you are using tip, but just a second.

greg
2018-02-11 22:34
@MattyBoy4444 - try the UX again,

greg
2018-02-11 22:35
Then we should try and figure out the version of drp you have.

2018-02-11 22:35
I was afraid you would say that. :) I just wiped and started installing stable. Hmmm... I could wipe again and start over. Not a big deal. I was just setting up a test platform.

2018-02-11 22:36
??

greg
2018-02-11 22:36
The bug I think that is happening and I tried to fix. Is only if you are using a stable DRP against the master UX.

greg
2018-02-11 22:36
That is what your image was showing.

greg
2018-02-11 22:37
It would be nice to see what was in your `Info and Preferences` page or `dr-provision --version`

greg
2018-02-11 22:37
If you used `install.sh` from tip, it will still grab stable.

2018-02-11 22:39
I am trying again. Probably take 15 min

2018-02-11 22:40
Basically, I rm /var/lib/dr-provision and then follow this again. http://provision.readthedocs.io/en/stable/doc/install.html

2018-02-11 22:42
I originally had stable on this box. I assumed following these instructions after the remove, would basically be like doing a fresh install.

greg
2018-02-11 22:42
I think so.

2018-02-11 22:43
I am following that same procedure now. I HOPE i didn't waste your time'

greg
2018-02-11 22:49
no

greg
2018-02-11 22:50
The big thing is that to get tip, you have to explicitly ask for it if you are using install.sh.

greg
2018-02-11 22:50
I needed to fix the UX bug.

2018-02-11 22:53
I may have downloaded the tip and then ran the following command to install: sudo ./install.sh --force install

2018-02-11 22:53
I didn't see the switch for the version. So now i just ran the following command: sudo ./install.sh --force=true --version=tip install

greg
2018-02-11 22:54
hmm - I think it should be --drp-version=tip

greg
2018-02-11 22:54
nvm - either works now.

greg
2018-02-11 22:54
well for tip install.sh

greg
2018-02-11 22:54
:slightly_smiling_face:

2018-02-11 23:09
@rackneng I can now setup a subnet!!

greg
2018-02-11 23:12
:slightly_smiling_face:

shane
2018-02-11 23:18
@greg - I added `--version` because I had documented it as ... `--version` ... but I left the `--drp-version` flag for backward compatibility ... :slightly_smiling_face:

shane
2018-02-11 23:18
@MattyBoy4444 - I'd also suggest switching the Doc version to `latest` - not the `stable` docs ...

2018-02-12 00:05
Thanks for all the help. It is up and running. Now, if I could get the damn Up Squared Intel UEFI board to net boot. I get the following error from tftp: TFTP: lpxelinux.0: transfer error: sending block 0: code=8, error: User aborted the transfer

2018-02-12 00:05
Any suggestions. ARG!!!!

greg
2018-02-12 01:48
firewall?

2018-02-12 03:09
Na. They are on same.subnet

2018-02-12 03:10
Connected to same switch

2018-02-12 03:11
I saw there was some uefi issues in January. Has these been resolved and are the changes in the tip?

greg
2018-02-12 03:12
They are in tip. Firewall on DRP endpoint

2018-02-12 03:22
It is a clean install of Ubuntu server 16.04 with No extras/lamp

greg
2018-02-12 03:32
Some in the community have had issues with iptables putting in tftp blocking rules.

greg
2018-02-12 03:33
While not your current issue, a future issue will be that you need to unset the bootfile in the subnet (not needed in tip), because lpxelinux.0 doesn?t support uefi.

lae
2018-02-12 13:17
https://github.com/digitalrebar/provision/pull/684 I'm guessing the "DRP freezes up issue" I'm seeing is related to this? lol

daniel.bernier
2018-02-12 13:32
hi anybody can explain why all the UX pages work perfectly EXCEPT for ?machines? which since yesterday stays at ?loading machines?

zehicle
2018-02-12 13:54
@daniel.bernier if you are on the github URL then you may have hit a bug w/ a new feature. Login to https://portal.rackn.io -> that version of the UX is more stable

zehicle
2018-02-12 13:55
if you know how to look at your browser's dev tools, it would be helpful to know which network call(s) are failing on your system

zehicle
2018-02-12 13:56
@daniel.bernier do your machine run's gohai to get inventory information? that could also be the issue. the new code shows machine inventory values on the machines page

greg
2018-02-12 14:50
Yes @lae

greg
2018-02-12 14:51
Likely. Working to cut 3.7 soon

daniel.bernier
2018-02-12 15:18
@zehicle thanks switched to http://portal.rackn.io and issue was cleared. As per Gohai, only runs as part of discovery yet ? haven?t played to much with it yet. Havin? a ball with workflows right now :smile:

lae
2018-02-12 15:29
is http://portal.rackn.io going to be set as the default for drp stable at least?

shane
2018-02-12 15:29
@lae yes

shane
2018-02-12 15:29
on v3.7.0 release that will be the default redirect and is the "production" UX endpoint

lae
2018-02-12 15:30
got it, that way it makes sense

lae
2018-02-12 15:30
and I guess tip would stay the same?

shane
2018-02-12 15:30
yep - a little bit of a chicken-and-egg issue getting the versioned UX Endpoints in place and ready to make the switch in the DRP side

greg
2018-02-12 15:31
tip will still point to portal. You can change portal to latest to get the better edge.

lae
2018-02-12 15:31
https://github.com/digitalrebar/provision/issues/617 also this is affecting me as well, downgraded drpcli to 3.2.1 and it successfully changes it on a DRP 3.6.0 server

shane
2018-02-12 15:32
going forward; the github UX Endpoint will always be the "most recent (master) ... and probably a lil bit unstable"

shane
2018-02-12 15:35
we'll control UX updates via the following flow: http://rackn.github.io - latest master; unstable http://latest.rackn.io - will be like DRP "tip" - generally stable latest features http://portal.rackn.io - production released version

zehicle
2018-02-12 21:05
has suspended the Gitter & IRC synchronization with this channel

mchill
2018-02-13 06:39
has joined #json

tsahiduek
2018-02-13 07:31
has joined #json

tsahiduek
2018-02-13 12:40
Hey, I?m new to digital rebar and I have a question regarding workflows. From what I understand different profiles can have different workflows for different Bare-metal installation types (please correct me if I?m wrong). The thing I don?t understand is how to determine which machine will ?pxe-boot? to the right profile? How do I make the connection between a machine that just booted-up snd the appropriate profile? Thanks

greg
2018-02-13 13:38
@tsahiduek - the profile containing the workflow you want needs to be added to the machine?s profile list.

greg
2018-02-13 13:39
That is the simple answer. The question is how and when to do that.

greg
2018-02-13 13:40
There are a lot methods for that. One is manually from the UX or CLI. Another is terraform for a ?grab and go? style of operations. Another is to write a stage/task set to classify the node as it goes through discovery. The workflow can be modified ?inflight? during the discovery workflow.

greg
2018-02-13 13:41
This where your use and goals for deployment and operation come in to help you make that decision.

tsahiduek
2018-02-13 14:08
I want to automate (not via the UI) the process of installing bare metal. I want to create profile for each ?type of installation? - for example: centos 7 for DB cassandra, ubuntu 16.04 for openstack etc? How do I identify the machine that just ?PXE booted? to appropriate profile? I hope I?m clear about what I?m trying to do?

shane
2018-02-13 14:15
@mchill $welcome

2018-02-13 14:15
Digital Rebar community welcome information is here > http://rebar.digital/community/welcome.html

shane
2018-02-13 14:17
@tsahiduek - at some point, you have to be able to classify your systems - if you can do that based on CPU/Memory/Disk - then you can use the Gohai Inventory components (except for disk) ... and you can write a Param back to Digital Rebar Provision (DRP) Endpoint that has that classification - then in later Stages in the Workflow - you can use that Param to "do something" specific with the classification

shane
2018-02-13 14:17
you can also write a Stage (that uses a Task and Template) to call out to a DCIM/Asset Management system of some sort and ask it what the system should be (maybe based on MAC address, Serial Number ... or something else)

shane
2018-02-13 14:18
again - you'd then tag your Machine with a Param with the classification info and adjust your workflow accordingly

shane
2018-02-13 14:19
if you use the Stage/Task/Template route - note that any queries made to your DCIM/Asset Management will be **from** the Machine being provisioned

shane
2018-02-13 14:20
in some environments, this will not work - as the provisioning networks do not have access based on security policy - in which case you might need to write a Plugin for the DRP Endpoint, to make the query on behalf of the Machine - so you can control the security aspects

tsahiduek
2018-02-13 14:21
Thanks for the detailed answer. I?ll try to go in with the ?DCIM way? Thanks :slightly_smiling_face:

greg
2018-02-13 15:42
@lae - hit me up when you can. I want to move all the partman directives into the part template. I think this will hit you the most.

greg
2018-02-13 15:42
Bug #53 in community content.

wdennis
2018-02-13 16:37
@greg ^^^ nice - would affect me as well (but I asked for it)

greg
2018-02-13 16:41
yeah - I figured. :slightly_smiling_face:

spector
2018-02-13 16:41
Community - I sent out a Monthly RackN Newsletter yesterday and if you did not receive and would like too, please ping me. I do this 1x a month and it is pretty short but highlights events we are attending and key themes; this month was Cobbler. I understand that Digital Rebar is an open source community and I am not going to market to you (I promise) but just wanted to make everyone aware of this newsletter in case you are interested. Return to your open source goodness?

lae
2018-02-13 17:27
@greg I was actually meaning to talk about that

lae
2018-02-13 17:27
(not the issue but about partman directives)


greg
2018-02-13 17:30
yikes ? oops

lae
2018-02-13 17:30
We're currently using our own debian-9 and ubuntu-16 stages with that change as well as a change specifying a repo mirror (since it doesn't look like the Repo stuff is in community-content - and I haven't had a chance to look into it)

lae
2018-02-13 17:30
Anyway, I remembered why I may not have submitted that PR

lae
2018-02-13 17:31
I wasn't sure if we should have, e.g., "part-seed-X" for d-i templates and "part-ks-X" for ks templates

lae
2018-02-13 17:32
I had named it "part-scheme-X" to reduce the number of files to maintain and because we can check for OS in the part-scheme - this is actually how I was doing part templates in cobbler

lae
2018-02-13 17:33
but, apart from a default template, in my experience it's not very common to use a single template for both centos/debian

greg
2018-02-13 17:33
I thought about it. You have to manage profiles either way. So I let it be a single variable. Part-scheme is a bug above

greg
2018-02-13 17:34
the `select-kickseed` parameter is used in the default bootenvs to override. If you have your own bootenvs, then it is less of an issue.

lae
2018-02-13 17:36
@lae uploaded a file: https://rackn.slack.com/files/U54E4SD4G/F98J9QR29/image.png and commented: actually never mind, turns out I didn't check for OS within the templates during our cobbler days, lol

greg
2018-02-13 17:36
:slightly_smiling_face:

lae
2018-02-13 17:37
want me to submit that PR?

greg
2018-02-13 17:38
already took it. :slightly_smiling_face:

greg
2018-02-13 17:38
tip content already updated

lae
2018-02-13 17:38
wew, kk

greg
2018-02-13 17:41
@lae - the net of this was meant for me to warn you that I?m going to move all the partman refs inside the scheme tmpl. This way people can deal with their own gpt or not. Lvm or not.

lae
2018-02-13 17:42
hm

lae
2018-02-13 17:42
moving *all* of `d-i partman` is going to introduce a lot of duplication

lae
2018-02-13 17:43
I can understand moving some of them that actually configure partitioning, but several of them are to avoid prompts

greg
2018-02-13 17:43
yeah the question is one of ordering.

lae
2018-02-13 17:43
ordering of the commands? it shouldn't matter I'm pretty sure

greg
2018-02-13 17:44
@greg uploaded a file: https://rackn.slack.com/files/U02DGQYK1/F98EC1UKW/-.yaml and commented: This is it right now.

greg
2018-02-13 17:44
It seems to have for others in the community.

wdennis
2018-02-13 17:44
@lae I'm emprically finding out that the partman directive odering does count... (it seems anyways)

greg
2018-02-13 17:45
The last four could live outside. They are common and always that value I think.

wdennis
2018-02-13 17:45
And for LVM vs. not, or md raid with LVM on top, directives differ

lae
2018-02-13 17:45
like, putting just the labels inside part-scheme causes issues?

wdennis
2018-02-13 17:45
No, the directives may need to change depending on what partitioning scheme is used

wdennis
2018-02-13 17:46
makes sense to have them all in one "container" (template)

wdennis
2018-02-13 17:46
instead of split between two

wdennis
2018-02-13 17:47
I use `select-kickseed` to template non-partitioning directives

wdennis
2018-02-13 17:48
that may differ among my builds

lae
2018-02-13 17:48
I'm just not a big fan of a lot of boilerplate, but I guess if people are actually having issues with the current layout, I can't really quite complain

lae
2018-02-13 17:48
(and I'm just slightly surprised I haven't run into any issues with the part-schemes I've written)

wdennis
2018-02-13 17:48
then `part-scheme` should have all of the partitioning in it, enabling "mix-n-match" between the two

greg
2018-02-13 17:49
I?m in favoring of localizing common stuff. I need to think about this some more.

greg
2018-02-13 17:49
well the problem becomes one of depth of nesting.

wdennis
2018-02-13 17:49
Does anyone know if you add d-i directives that aren't needed/call on, if it messes up the installer automation?

greg
2018-02-13 17:50
not sure

wdennis
2018-02-13 17:50
me either

lae
2018-02-13 17:50
I don't usually need several of the boilerplate in net-seed - specifically LVM, and stuff provisions fine

greg
2018-02-13 17:50
probably depends upon the directive and its use.

wdennis
2018-02-13 17:51
Also, was thinking of if all partitoning in a single template, then community could build up a library of partitoning templates that could be easily plugged into the default DRP-provided kickseed, or a customized one

wdennis
2018-02-13 17:52
(also need to think about/do the same for RedHat-family distros, using kickstart syntax)

greg
2018-02-13 17:53
need to think about this and let it cook a little. I?m not going to change anything at this instant. I think.

wdennis
2018-02-13 17:53
OK, fair enough

greg
2018-02-13 17:54
I think I can environ some ways to get everybody to where they want to go, but want it bake a little more.

wdennis
2018-02-13 17:55
I just had to fall back to Cobbler / Clonezilla imaging to do deploys, b/c DRP preseed templates were erroring out on platforms I need to install with specific partitioning requirements...

wdennis
2018-02-13 17:56
I was thinking it was the interaction between the `d-i partman*` directives in the net-seed.tmpl (or custom version thereof) and the directives I put in my custom `part-scheme`-called template

greg
2018-02-13 17:57
ok

wdennis
2018-02-13 17:58
It's basically not really a DRP-related problem (except of the `d-i partman*` directives split, if that's the issue) but more of a "how to do a specific partitioning recipe in preseed" problem

wdennis
2018-02-13 18:00
I just don't want to have to have pairs of `select-kickseed` and `part-scheme` templates that I need to keep track of

wdennis
2018-02-13 18:01
Hence the ask to combine all `d-i partman*` stuff into one template

gbuehler
2018-02-13 18:32
for a new deployment would the recommended path be to use the dockerized DRP?

shane
2018-02-13 18:34
@gbuehler - I hope you are not referring to the old Digital Rebar ver2 version? Digital Rebar Provision (DRP) ver3 is NOT containerized as distributed (but can easily be built in to a container)

shane
2018-02-13 18:38
We look forward to seeing you at the V011 meetup in 20 mins or so ... details: https://www.meetup.com/digitalrebar/events/247321385



lae
2018-02-13 20:03

lae
2018-02-13 20:05
@wdennis hold up - when you made custom `part-scheme` templates are you sure you weren't hitting this bug? https://github.com/digitalrebar/provision-content/pull/56/files

lae
2018-02-13 20:06
where the part-scheme wouldn't have been loaded if it wasn't named `part-seed-$scheme`

lae
2018-02-13 20:06
I had fixed it in a local template a long time ago

wdennis
2018-02-13 20:06
@lae Yeah, figured that one out...

shane
2018-02-13 20:08
- v3.7.0 is going to be cut in the next day or two ... if any of you have extra cycles to test `tip` - please do so - there are a LOT of changes, bug fixes, and enhancements ... we appreciate any additional testing and verification in different environments prior to cutting the v3.7.0 release - THANKS !!

gbuehler
2018-02-13 20:12
i think @greg already captured this, but pinning major versions in docker hub would be super cool

lae
2018-02-13 20:13
there are docker image releases?

greg
2018-02-13 20:13
strangely enough there kinda is.

gbuehler
2018-02-13 20:14
i mean, as long as you love living off `master` there are


greg
2018-02-13 20:14
but like @gbuehler mentions it is rebuilt when I move tip. So not quite master, but close.

lae
2018-02-13 20:14
oh

lae
2018-02-13 20:14
i've just been living through my ansible stuff lol

greg
2018-02-13 20:18
More thinking for me todo.

lae
2018-02-13 20:54
while I'm still awake

lae
2018-02-13 20:54
I have the following stage

lae
2018-02-13 20:54
``` [lae@yuzu fireeye-content]$ cat content/stages/labs-debian-9.yml --- Name: "labs-debian-9-install" Description: "Debian 9 install stage for FireEye Labs environment." BootEnv: "labs-debian-9-install" RunnerWait: true Tasks: - "ubuntu-drp-only-repos" - "enforce-public-key-authentication" - "default-user-access" Meta: icon: "download" color: "yellow" title: "FireEye Content" ```

lae
2018-02-13 20:55
I removed change-stage recently after parsing through some chat logs and community-content commit history, but I still have the issue where `drpcli processjobs` hangs after all tasks complete

lae
2018-02-13 20:56
i see all the completed jobs in the UI

lae
2018-02-13 20:57
but if I run `drpcli machines update $UUID '{ "Runnable": true }'` externally, it exits and finishes the install

greg
2018-02-13 20:58
Not sure why that frees it. Two things:

greg
2018-02-13 20:59
1 set runnerwait to false will cause the runner to exit when done with all tasks in the stage assuming no workflow changes stage on you

greg
2018-02-13 21:00
2 if this is part of a workflow, use the stop action instead of success for the last stage you want to run during the seed file

greg
2018-02-13 21:01
Stop in the workflow will cause the runner to exit on the stage change.

lae
2018-02-13 21:01
` labs-debian-9-install: "complete-nowait:Success"`

lae
2018-02-13 21:01
oh

lae
2018-02-13 21:01
I see

lae
2018-02-13 21:01
I had also tried setting `RunnerWait: false` previously but it hadn't helped (and noticed it was true in the community repo anyway)

greg
2018-02-13 21:02
Well that should have worked. I think.

greg
2018-02-13 21:03
Does complete-nowait have runnerwait true?

lae
2018-02-13 21:04
complete-nowait is in community repo and it has runnerwait false last i checked

lae
2018-02-13 21:04
(which is the point of nowait after all)

lae
2018-02-13 21:04
let me try again

lae
2018-02-13 21:06
while i'm at it guess I'll test some things in drpcli tip

lae
2018-02-13 21:48
yeah so setting RunnerWait: false in the stage itself had no effect, I needed to update change-stage/map to Stop

lae
2018-02-13 21:51
also my yaml issue and the change stage error both appear resolved for me on tip

shane
2018-02-13 21:51
woot woot !!

shane
2018-02-13 21:51
@lae ... go to sleep - you're making me tired just thinking about you being up still :slightly_smiling_face:

lae
2018-02-13 21:53
I'm like, not sleepy

shane
2018-02-13 21:53
Yay for energy drinks !!

lae
2018-02-13 21:53
:joy:

lae
2018-02-13 21:58
https://aur.archlinux.org/packages/drpcli-tip and for the arch-initiated users I guess I made a drpcli-tip PKGBUILD

markw
2018-02-14 16:06
has joined #json

shane
2018-02-14 19:17
@markw $welcome

2018-02-14 19:17
Digital Rebar community welcome information is here > http://rebar.digital/community/welcome.html

wdennis
2018-02-14 19:53
I updated the 'impi' plugin to latest (tip) on my v3.6.0 install, and it seems to have hung the server... I killed it and restarted (running isolated), and this is what I see:

wdennis
2018-02-14 19:53

shane
2018-02-14 19:53
yep - that will do it

shane
2018-02-14 19:54
`tip` plugins are `plugin-v2` style

shane
2018-02-14 19:54
v3.6.0 stable is `plugin-v1` style - radically different

wdennis
2018-02-14 19:54
Why did it let me update?

shane
2018-02-14 19:54
completely and utterly non-compatible

shane
2018-02-14 19:54
because

wdennis
2018-02-14 19:54
And I would know that how?

shane
2018-02-14 19:55
what UX endpoint were you using ?

wdennis
2018-02-14 19:55

wdennis
2018-02-14 19:55
Same as ever...

shane
2018-02-14 19:56
our new `stable` UX endpoint going forward, which you should use instead of that is: https://portal.rackn.io

wdennis
2018-02-14 19:57
I thought that's with the new v3.7 when released?

greg
2018-02-14 19:57
In this channel, I said NOT to do that.

shane
2018-02-14 19:57
the default redirect in v3.7.0 will switch to that

wdennis
2018-02-14 19:57
@greg must have missed that...

shane
2018-02-14 19:57
we're still working out all the kinks between Feature Flags, UX Endpoint Version, DRP Endpoint Version ... and applying appropriate guardrails on those things

greg
2018-02-14 19:57
anyway, it does. It will be resolved on v3.7.0 stable - ux, plugins, and all will have better trigger to prevent bad behaviour.

wdennis
2018-02-14 19:57
good to hear

wdennis
2018-02-14 19:59
No way to roll back? (Or if v3.7 in the next few days, I could just wait and upgrade...)

greg
2018-02-14 19:59
or move to tip drp

wdennis
2018-02-14 20:00
Can upgrade from current tip to future stable?

shane
2018-02-14 20:00
yep

wdennis
2018-02-14 20:01
I usually do `tools/install.sh --isolated --upgrade install` -- what do i need to add to get `tip`?

shane
2018-02-14 20:05
depends - do you have a copy of the `stable` or `tip` install.sh script ? (note: `tip` has `--version` flag - don't look at the usage, look at the `case` statement)

wdennis
2018-02-14 20:05
prolly `stable`

shane
2018-02-14 20:05
I fixed a bit of the version stuff in there - but it's only in `tip` - after v3.7.0 publishes, it'll be in `stable`

shane
2018-02-14 20:06
I'd suggest: ```curl -s get.rebar.digital/tip -o install.sh bash ./install.sh install --isolated --upgrade --force --version=tip ```

wdennis
2018-02-14 20:08
Cool, done

wdennis
2018-02-14 20:10
On `v3.6.0-tip-149-4d49d65825eaab25ce0e3bfde8871d3ee05337db`

wdennis
2018-02-14 20:11

wdennis
2018-02-14 20:27
In a machine, was `select-kickseed` a `string` before, but now an `object`?

greg
2018-02-14 20:27
it should always be a string.

wdennis
2018-02-14 22:10
@greg looks like it's an object now in the UX...

wdennis
2018-02-14 22:10

wdennis
2018-02-14 22:11
And on the last install, it did not seem to use my custom `necla-ubu-seed.tmpl`

greg
2018-02-14 22:14
Is the `select-kickseed` in the parameters list in the UX?

greg
2018-02-14 22:15
What version is the content? I think select-kickseed is in tip content as a parameter.

wdennis
2018-02-14 22:15
It is not.

wdennis
2018-02-14 22:16
drp-community-content is at v1.1.0 in "Content Packages"

greg
2018-02-14 22:16
That is really old.

wdennis
2018-02-14 22:16
Version inspections says: ```drp-community-content content Major Upgrade from v1.1.0 to v1.5.0```

wdennis
2018-02-14 22:17
However... I can't upgrade, the upgrade button is greyed in Content Packages

wdennis
2018-02-14 22:18
(a bug reported to @zehicle that I believe he said he has a fix for...)

greg
2018-02-14 22:18

wdennis
2018-02-14 22:19
aha

wdennis
2018-02-14 22:19
OK, fixed, thanks

wdennis
2018-02-14 22:20
I'll be glad to get back to stable when it's 3.7.x and use the stable UX

marcelo
2018-02-15 15:14
has joined #json

shane
2018-02-15 16:04
@marcelo $welcome

2018-02-15 16:04
Digital Rebar community welcome information is here > http://rebar.digital/community/welcome.html

marcelo
2018-02-16 01:46
Thank you @shane I can't wait to start automating baremetal builds.. looking to integrate it with Terraform and Ansible for configuration management

sevans
2018-02-16 21:22
has joined #json

zehicle
2018-02-19 05:43
@sevans $welcome !

2018-02-19 05:43
Digital Rebar community welcome information is here > http://rebar.digital/community/welcome.html

zehicle
2018-02-19 05:44
slackbot help

2018-02-19 05:44
Available Commands: FAQ, $FAQ, $faq, $KRIB, $krib, $meetup, $Meetup, $issue, $Issue, $issues, $Issues, $quickstart, $QuickStart, $welcome, $conduct, $code-of-conduct

marcelo
2018-02-19 06:54
Howdy all, question... If I want to use RackN DR Provision at a site with no internet connectivity how can I access the features which require a RackN login?

zehicle
2018-02-19 07:23
@marcelo Provision does NOT require connectivity to function - it does not connect the the internet. We (RackN) can help you adjust templates so the the O/S installs only use local resources too. The UX is a cross-origin application that uses your browser to connect between the end-point and our SaaS. In that way, all management is actually behind your firewall. DR Provision is not "going through" the public internet in any way. We also offer a license of the UX that also runs on-prem ("air gap") so connection is needed at all.

marcelo
2018-02-19 07:26
Ok Thanks for the clarification @zehicle .. let me run a few builds and simulations and see how we go..

zehicle
2018-02-19 16:12
we're going to do a short video explaining this - it's a common question and our approach is unique since it's based with an on-prem support mentality with SaaS to support management.

michael.harp
2018-02-19 17:05
has joined #json

abrinded
2018-02-19 17:46
has joined #json

chermack
2018-02-19 18:56
Michael, Andrew welcome aboard

detiber
2018-02-20 03:55
Just wanted to say great work on the UEFI support in the latest tip release, I am now able to boot two problematic machines using ipxe.efi that do not boot with the stable release!

shane
2018-02-20 03:56
Awesome!

vlowther
2018-02-20 17:07
@detiber Which machines are they? Got any more problematic ones?

detiber
2018-02-20 17:09
@vlowther A liva x mini-pc and an asus sabertooth x79 system where the ones that were giving me problems before. I have others that may be problematic, but they are also arm64, so I haven't started tackling that yet :slightly_smiling_face:

vlowther
2018-02-20 17:39
Well, arm64 will be fun in more ways than one. :slightly_smiling_face:

dave.parker
2018-02-20 19:31
has joined #json

dave.parker
2018-02-20 19:34
Hi folks. I have a few questions. I'm following the quick start, and using virtualbox guests for both the server and first client. I can boot sledgehammer and discover just fine, but when I try to switch bootenvs to install I get this error: ```Error: ValidationError: machines/0d31f21f-ab03-4fbb-9b19-ba3f445edadb: Can not change bootenv while in a stage unless forced. old: sledgehammer new ubuntu-16-04-install```

dave.parker
2018-02-20 19:34
No amount of forcing from the command line fixes this. If I go into the gui and edit the machine and select the Force checkbox, I can then change the bootenv from the command line, though.

dave.parker
2018-02-20 19:35
However, on the next boot, the machine boots sledgehammer and repeats discovery instead of doing the install as expected. It registers with a new UUID, which I assume is why it comes up as a brand new machine.

dave.parker
2018-02-20 19:35
How is the UUID determined?

shane
2018-02-20 19:39
@dave.parker what DRP Endpoint version are you using? (`drpcli info get | grep version`)

dave.parker
2018-02-20 19:41
"version": "v3.6.0-0-0e5ccf678a3e5b5fdb10f86261247cd28c858ac0"

shane
2018-02-20 19:41
also - make sure you set the Stage with the Install bootenv you'd like to install - do not switch the BootEnv itself directly ... if you are using Stages, you can not change the bootenv - as the Stage references an existing BootEnv

shane
2018-02-20 19:42
we have a LOT of fixes and enhancements in our current `tip` release - which is about to be released in the next 1 or 2 days at v3.7.0 - if this is a non-production scenario - I highly recommend upgrading to the `tip` release

dave.parker
2018-02-20 19:42
So I shouldn't be trying to change the bootenv directly?

shane
2018-02-20 19:42
not if you are using Stages, no

dave.parker
2018-02-20 19:42
Is the quickstart out of date then?

dave.parker
2018-02-20 19:42
Ok, I can upgrade to tip.

shane
2018-02-20 19:43
which version of QuickStart are you using? (`latest`, `stable`, etc) ?


dave.parker
2018-02-20 19:43
So, stable.

shane
2018-02-20 19:44
(upgrade to tip: `curl -s get.rebar.digital/tip | bash -s -- install --isolated --version=tip --upgrade --force` <-- assumes you are using "isolated" install mode, not production)

shane
2018-02-20 19:44
yes - please switch Doc to the `latest` version

shane
2018-02-20 19:44
you can use lower right floating selector for that

dave.parker
2018-02-20 19:44
Gotcha

dave.parker
2018-02-20 19:46
latest still has the same bootenvs command though. Hrm.

dave.parker
2018-02-20 19:46
Anyway, upgrading.

shane
2018-02-20 19:47
I'll run through the quickstart and validate - it probably needs to be updated to say Stages

dave.parker
2018-02-20 19:50
:thumbsup:

shane
2018-02-20 20:11
@dave.parker - yes, change the `drpcli machines bootenv ...` command to `drpcli machines stage ... ` - it is otherwise identical ... long story short ... with the `defaultStage` preference set, this means a machine will use the Stage system. Therefor, you have to use Stages to change Machine between bootenvs ... If you leave the `defaultStage` set to `none`, then the Stages won't be enabled (effectively) - and the use of `bootenv` is correct. I'll clean up the QuickStart documents around that this afternoon.

dave.parker
2018-02-20 20:13
Ahh ok

dave.parker
2018-02-20 20:13
Thank you!

dave.parker
2018-02-20 20:13
I will give that a try once I'm done reloading install ISOs

dave.parker
2018-02-20 20:27
Ok, so now I have a machine discovered, and I was able to set the stage to the ubuntu install, and everything looks good. But when I reboot the system and it tries to PXE boot, dr-provision crashes:

dave.parker
2018-02-20 20:27
```Tried to access unlocked resource tasks panic: Tried to access unlocked resource tasks goroutine 103 [running]: log.Panicf(0xe121f9, 0x24, 0xc423ffccf8, 0x1, 0x1) /home/travis/.gimme/versions/go1.9.linux.amd64/src/log/log.go:337 +0xda http://github.com/digitalrebar/provision/backend.(*DataTracker).lockEnts.func1(0xdf96b1, 0x5, 0xc423ee3b30) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/dataTracker.go:574 +0x143 http://github.com/digitalrebar/provision/backend.(*RequestTracker).stores(0xc4240b43c0, 0xdf96b1, 0x5, 0xc423f4aa58) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/requestTracker.go:128 +0x3e http://github.com/digitalrebar/provision/backend.(*RequestTracker).(github.com/digitalrebar/provision/backend.stores)-fm(0xdf96b1, 0x5, 0x0) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/machines.go:421 +0x3e http://github.com/digitalrebar/provision/backend.(*Machine).Validate(0xc4240707e0) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/machines.go:422 +0x321 http://github.com/digitalrebar/provision/backend.(*Machine).BeforeSave(0xc4240707e0, 0x12e11c0, 0xc4240707e0) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/machines.go:534 +0xb4 http://github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store.save(0x12e78e0, 0xc423e46d80, 0x12e11c0, 0xc4240707e0, 0x823d4b, 0xc420248b68, 0x1ad71e8) /home/travis/gopath/src/github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store/keySaver.go:166 +0x1a4 http://github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store.Save(0x12e78e0, 0xc423e46d80, 0x12e11c0, 0xc4240707e0, 0xc4240707e0, 0x1, 0xc424006510) /home/travis/gopath/src/github.com/digitalrebar/provision/vendor/github.com/digitalrebar/store/keySaver.go:188 +0x49 http://github.com/digitalrebar/provision/backend.(*RequestTracker).Save(0xc4240b43c0, 0x12df2c0, 0xc4240707e0, 0xc424047b60, 0x3, 0x3) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/requestTracker.go:324 +0x1c4 http://github.com/digitalrebar/provision/midlayer.(*DhcpRequest).coalesceOptions.func1(0xc4240387a0) /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:330 +0xb98 http://github.com/digitalrebar/provision/backend.(*RequestTracker).Do(0xc4240b43c0, 0xc423ffd6c8) /home/travis/gopath/src/github.com/digitalrebar/provision/backend/requestTracker.go:111 +0xd9 http://github.com/digitalrebar/provision/midlayer.(*DhcpRequest).coalesceOptions(0xc4238140c0, 0xc42405e300, 0xc42409fd10, 0x0) /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:274 +0xd6d http://github.com/digitalrebar/provision/midlayer.(*DhcpRequest).buildDhcpOptions(0xc4238140c0, 0xc42405e300, 0xc42409fd10, 0x0, 0xc4240627fc, 0x4, 0x4) /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:441 +0x98 http://github.com/digitalrebar/provision/midlayer.(*DhcpRequest).ServeDHCP(0xc4238140c0, 0xc42409e701, 0x4, 0xc42407e148, 0x1) /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:779 +0xcb7 http://github.com/digitalrebar/provision/midlayer.(*DhcpRequest).Process(0xc4238140c0, 0x0, 0x0, 0xffffffffffffffff) /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:836 +0x85f http://github.com/digitalrebar/provision/midlayer.(*DhcpRequest).Run(0xc4238140c0) /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:864 +0x2b created by http://github.com/digitalrebar/provision/midlayer.(*DhcpHandler).Serve /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:920 +0x2e5```

dave.parker
2018-02-20 20:28
I must have missed something.

shane
2018-02-20 20:29
@greg and or @vlowther will have to take a look... I'm in a mtg right now....

vlowther
2018-02-20 20:41
looking at it.

vlowther
2018-02-20 20:41
@dave.parker What version of dr-provision is this?

dave.parker
2018-02-20 20:42
"version": "v3.6.0-tip-186-ed32fbb1e324a3e033a55d131d9b067f7442f4d5"

dave.parker
2018-02-20 20:42
I deleted the machine, rediscovered, and then reset the stage to ubuntu-install, and now it's installing the OS as expected. Not sure what the problem was before...

vlowther
2018-02-20 20:43
hm.

vlowther
2018-02-20 20:44
that is in the codepath where the DHCP system wants to update the machine IP address

vlowther
2018-02-20 20:44
Does this machine have multiple nics it can boot off of connected to the same physical network?

dave.parker
2018-02-20 20:45
Nope.

vlowther
2018-02-20 20:47
ok

vlowther
2018-02-20 20:54
I will have a fix out for that issue shortly.

dave.parker
2018-02-20 20:58
Excellent

vlowther
2018-02-20 20:59

dave.parker
2018-02-20 21:11
I have another weird problem. When I do get the install to work, I can't log in. The params `provisioner-default-user` and `provisioner-default-password-hash` default to rocketskates/r0cketsk8ts correct?

dave.parker
2018-02-20 21:12
I tried overriding those in the global profile with my own values (with a hash I generated via mkpasswd) but that doesn't work either.

dave.parker
2018-02-20 21:19
I can boot the system off a regular ubuntu iso and go into rescue mode, get a shell, and see the username I trired to add is there, along with the hash in /etc/shadow. If I reset the password there I can then log in after rebooting it.

dave.parker
2018-02-20 21:19
I guess I'll try just grabbing this hash and putting it in the param

dave.parker
2018-02-20 21:37
Grr, stuck on that DHCP bug again. When that pull request gets approved and merged will I be able to grab the fix by reinstalling tip?

greg
2018-02-20 21:42
@dave.parker yes

greg
2018-02-20 21:42
Victor?s fix should be in tip in about 30 minutes

dave.parker
2018-02-20 21:42
Awesome, thank you.

dave.parker
2018-02-20 21:47
Oh hey, manually changing the IP of the machine to the one dhcp is trying to give it fixes the issue and it boots into the install now. Sweet.

vlowther
2018-02-20 21:48
hmmm... manually changing it whwere?

dave.parker
2018-02-20 21:49
In the GUI I went to machines, and then clicked on the machine name, edited it, and put in the IP that was under the "leases" tab for this machine. Which was different than the one already there.

dave.parker
2018-02-20 21:50
That seemed to have fixed it? I think I also cleared out all the leases first though.

vlowther
2018-02-20 21:51
ok

dave.parker
2018-02-20 21:52
Yeah, I cleared out all the leases, tried to boot the machine again, and it failed again, but a new lease had popped up in the GUI. So that's the one I grabbed.

vlowther
2018-02-20 21:52
ok

dave.parker
2018-02-20 21:52
I'm old school, I basically kept trying bigger hammers until one of them "fixed" it...

vlowther
2018-02-20 21:53
I generally don't recommend clearing out all the leases

vlowther
2018-02-20 21:53
not unless you are going to whack all the machine records as well.

dave.parker
2018-02-20 21:53
Ah ok.

dave.parker
2018-02-20 21:53
Good to know.

dave.parker
2018-02-20 21:54
I will not use that particular hammer again if I can help it. :smile:

vlowther
2018-02-20 21:54
ya, it is just like dropping the lease database from any other DHCP server.

vlowther
2018-02-20 21:57
We keep the option around for troubleshooting purposes

vlowther
2018-02-20 21:59
basically, the Address field on a machine is the one we expect it to PXE from

dave.parker
2018-02-20 21:59
Ok

dave.parker
2018-02-20 22:02
FYI, the hash I pulled from the shadow file after manually changing the password (then threw back in the `provisioner-default-password-hash` param) worked fine. Not sure what was wrong with the hash provided as a default or the one I generated with mkpasswd.

shane
2018-02-20 22:02
@dave.parker you can convert a Lease to a Reservation if you want the machine to retain a given IP addr over time

dave.parker
2018-02-20 22:03
Ok

shane
2018-02-20 22:04
we currently don't have a `drpcli` command to do this, but I wrote a dirty bash script which does - see: https://rackn.slack.com/files/U6QFVRJNB/F9AR6H56G/lease2res_sh_-_Lease_to_Reservation_conversion_script.sh

dave.parker
2018-02-20 22:21
Cool

shane
2018-02-20 22:22
You can change a Reservation ... well ... not really ... but you can convert the Lease to a new IP address Reservation from the DHCP assigned

shane
2018-02-20 22:24
however, that script doesn't add that ability - and it's not exactly a "supported" feature at the moment - there are some side effects of the Machine Object not getting updated correctly - I've filed a bug fo this internally (https://github.com/digitalrebar/provision/issues/737)

shane
2018-02-20 22:25
right now, you'd have to delete the Reservation, and re-create the reservation with a new IP address ... since the IP Address field is the index for the Reservation object

vlowther
2018-02-20 22:59
If the HardwareAddrs field on the Machine is populated with the MAC addresses of the nics on the machine, then things shoudl work as you expect them to.

shane
2018-02-20 23:00
deleting the Reservation and creating a new one w/ the MAC addr and new IP addr works

shane
2018-02-20 23:00
but the issue is the Machine Object "Address" field doesn't get updated - it reflects the original Lease address

dave.parker
2018-02-20 23:00
I just pulled the latest version of tip since I saw that pr got merged.

vlowther
2018-02-20 23:02
@shane that is becaise machine.Address is initially populated by Sledgehammer when the machine is created, and the codepath in the DHCP subsystem that would update it if HardwareAddrs on the machine was populated was broken until 30 mins ago.

vlowther
2018-02-20 23:03
or so. :slightly_smiling_face:

dave.parker
2018-02-20 23:27
The fix works for me.

dave.parker
2018-02-20 23:27
```dr-provision2018/02/20 23:26:06.870664 [34:14]dhcp [ warn]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/dhcp.go:328 [34:14]bcc675dc-e333-49cd-8c86-eb17419a783b: Updating machine 10.10.10.28 address from 10.10.10.21 to %!s(MISSING)```

dave.parker
2018-02-20 23:28
Although that message looks wrong. It updated the machine from .28 to .21, not from .21 to `%!s(MISSING)`

dave.parker
2018-02-20 23:28
:smile:

dave.parker
2018-02-20 23:28
Anyway, it booted and is reinstalling which is what I expect to see.

dave.parker
2018-02-20 23:28
Thanks for your help today. I appreciate it.

vlowther
2018-02-20 23:35
bah, format string typos. :slightly_smiling_face:

amit.handa
2018-02-21 14:49
has joined #json

spector
2018-02-21 14:57
Welcome Amit

dave.parker
2018-02-21 18:30
Hi folks. Another question. The docs talk about setting up the DHCP server on a subnet to just be a relay, but there's no examples of how to do that that I can find.

dave.parker
2018-02-21 18:32
I'm not even sure that's what I want though. So I guess let me state my use case. I have a network that already has a DHCP server running on it, and I can't start another one that will compete with it, and I can't create another network or vlan, so I have to just play nicely with what's there. I *can* have that server forward to another server, or if it's possible I can just do an iPXE boot directly to the provisioning server. But I'm not sure which would be better or how to set up the dr-provision DHCP server to not stomp on the other one.

vlowther
2018-02-21 18:44
hm

vlowther
2018-02-21 18:44
We operate normally as a target of a DHCP relay.

vlowther
2018-02-21 18:45
just create a subnet that covers the IP address the relay lives in.

vlowther
2018-02-21 18:45
and hten have the relay point to us.

vlowther
2018-02-21 18:49
The other options for coexisting with other DHCP infrastructure are to just point next-server in your current DHCP infrastructure to us (in which case you wouldn't create any subnets in Digital Rebar, and lety your current DHCP infrastructure do all the heavy lifting), or you can use us as a ProxyDHCP server.

dave.parker
2018-02-21 18:49
If I'm on the same subnet I keep getting messages about how there "might be another DHCP server on this network" (there is!) and I just get conflicts. I guess my question is how do I configure the subnet so it doesn't try to be an active DHCP server and answer all requests?

dave.parker
2018-02-21 18:49
Ahhhh

dave.parker
2018-02-21 18:49
Ok, I think that's the setup I want.

dave.parker
2018-02-21 18:50
So I don't configure a subnet at all and just have the existing DHCP server pass to me via next-server? I think that's what I want.

vlowther
2018-02-21 18:50
Yep.

dave.parker
2018-02-21 18:50
Ok let me give that a whirl.

dave.parker
2018-02-21 18:50
Thanks.

vlowther
2018-02-21 18:51
Going that rout you will need to configure next-server and bootfile appropriately for the nodes you will be booting

dave.parker
2018-02-21 18:52
Ok

vlowther
2018-02-21 18:55
for legacy BIOS nodes, you can just use lpxelinux.0

vlowther
2018-02-21 18:56
for UEFI systems, we use ipxe, which gets a little more complicated.

vlowther
2018-02-21 18:57
http://ipxe.org/howto/chainloading has the docs on setting up ipxe support.

vlowther
2018-02-21 18:58
anf http://ipxe.org/howto/dhcpd#pxe_chainloading has ISC DHCPD specific instructions

dave.parker
2018-02-21 18:59
Excellent

vlowther
2018-02-21 19:02
The 4 files we use for bootloading are: lpxelinux.0 <-- legacy BIOS support by default. No special config needed ipxe.pxe <-- legacy BIOS support using ipxe. You need to "break the loop" as described in the first link in the IPXE docs.

vlowther
2018-02-21 19:03
ipxe.efi <-- UEFI booting using ipxe. You will alkso need to break the loop.

vlowther
2018-02-21 19:04
default.ipxe <-- the filename dhcpd should send when ipxe is loaded

vlowther
2018-02-21 19:05
Our DHCP server handles figuring out which file to serve behind the scenes, but for others you need to configure them upfront.

dave.parker
2018-02-21 19:11
:thumbsup:

dave.parker
2018-02-21 19:12
So I don't need to do anything special to get UEFI machines to boot if I use the integrated DHCP server?

dave.parker
2018-02-21 19:12
That's sweet.

dave.parker
2018-02-21 19:12
I can use that config in our core sites, which will be good.

vlowther
2018-02-21 19:13
You can also run us in ProxyDHCP mode -- create subnets for the address ranges you want to use and set Proxy to true.

vlowther
2018-02-21 19:14
The tradeoff is that I have not tested to see if it works through a DHCP relay.

vlowther
2018-02-21 19:15
and you have to relay your DHCP traffic to us as well as your usual DHCP servers.

vlowther
2018-02-21 19:15
Oh, and we don't handle UEFI arm boxes (32 or 64 bit) yet. :slightly_smiling_face:

dave.parker
2018-02-21 19:16
That's not a problem for me thankfully.

amit.handa
2018-02-22 04:12
I am using 3.2.6 drp, trying to learn it by pxe-booting virtualbox vm on the laptop. I am receiving pxe-e32: tftp open timeout on server logs, I am getting [0:153]TFTP: lpxelinux.0: transfer error: read udp [::]:33657: i/o timeout I am able to do tftp <ip> $get lpxelinux.0 successfully. How to debug it further ? Thanks,

zehicle
2018-02-22 05:51
I'm assuming you mean v3.6. Did you set a --static-ip address? if so, what is it.

zehicle
2018-02-22 05:52
Also, did you set your vboxnet0 subnet?

zehicle
2018-02-22 05:53
any chance you put it in a container or have a firewall blocking traffic?

amit.handa
2018-02-22 07:30
hi

amit.handa
2018-02-22 07:30
managed to get it work

amit.handa
2018-02-22 07:31
I had specified incorrect subnet settings (next-server, specifically)

amit.handa
2018-02-22 07:31
I used wireshark to debug it

amit.handa
2018-02-22 07:31
sorry, I am learning the ropes of setting up kubernetes cluster via digital rebar

amit.handa
2018-02-22 07:32
yes, I mean v3.6

amit.handa
2018-02-22 09:13
Thanks for the information, Ideally, it should be there in the docs. currently, it needs improvement IMO :slightly_smiling_face:

shane
2018-02-22 13:44
@amit.handa - no worries ... in our next release (due out today, in fact), v3.7.0 - we have a new feature that should mean you do not need to specify the `next-server` in your subnet - it automatically inserts the value for you

amit.handa
2018-02-22 14:15
cool

amit.handa
2018-02-22 14:15
I have added drp as next-server for an existing dhcp deployment in our company.

amit.handa
2018-02-22 14:15
existing one is WDS

amit.handa
2018-02-22 14:16
I need to migrate the existing windows install images to drp

amit.handa
2018-02-22 14:16
hope it should be straight forward ?

amit.handa
2018-02-22 14:16
ll update

shane
2018-02-22 14:16
windows image provisioning is entirely possible w/ DRP, however, it's not available in the Open Community pieces - that is an advanced RackN functionality piece

shane
2018-02-22 14:17
we have done windows images for other customers, so we can do it, but it's not as straight forward ... because ... Windows ...

amit.handa
2018-02-22 14:17
yup

amit.handa
2018-02-22 14:17
but then the issue becomes how do I support both

amit.handa
2018-02-22 14:18
if it is feasible at all with community version

dave.parker
2018-02-22 15:44
Could someone give me a brief explanation of what it means to make a machine "runnable" or what the runner is/does? I'm kind of confused on that.

dave.parker
2018-02-22 15:45
If the "runner" is waiting can you assign a stage to a machine and have it immediately start doing its thing?

shane
2018-02-22 15:45
Sure! The runner is simply an agent used during the install process to enable the job queue and tasks to be executed. The Runner is ... actually ... just the `drpcli` binary put in to a special mode to listen for jobs to execute during the Stage transitions.

dave.parker
2018-02-22 15:45
Ahh

dave.parker
2018-02-22 15:46
So when the host is sitting at the OS login prompt after the sledgehammer boot, the runner is waiting for further instructions basically?

shane
2018-02-22 15:47
The Runner (drpcli) runs in the Sledgehammer (discovery) stages, and executed work. By default and by design, it "dissolves" and does not remain resident after install. However, for larger and more complex full lifecycle management solutions - we can leave the Runner in place as a long-lived service (ala `systemd`, etc.) that enables deeper integration for lifecycle management and other enforcement activities, if you should choose.

shane
2018-02-22 15:47
Yes - generally speaking, as long as you have not run a Stage/Task that disables the runner ... :slightly_smiling_face:

shane
2018-02-22 15:49
you can check for it by simply doing `ps -ef | grep drpcli | grep -v grep` on a target Machine

dave.parker
2018-02-22 15:49
Ok cool.

dave.parker
2018-02-22 15:50
Thank you.

shane
2018-02-22 15:52
an example stage that disable the Runner is `complete-nowait`

shane
2018-02-22 15:52
inversely, the stage `complete` marks the Machine as done, but leaves the Runner ... ahem ... running

dave.parker
2018-02-22 15:52
Hehe

dave.parker
2018-02-22 15:54
It's starting to make sense now. I'm starting to wrap my head around everything.

zehicle
2018-02-22 16:12
@amit.handa Good work getting your first machines booted!! Just to clarify: There is a single code base for Digital Rebar Provision (DRP), which includes the provisioning service and the client. DRP is APLv2 licensed. RackN offers commercial support of that code and pushes patches and features back into the open (community) repos. We also offer a significant amount of RackN advanced content, plugins, and functionality for the DRP ecosystem that are sold commercially and, in many cases, offered without charge to the community.

amit.handa
2018-02-22 16:24
Thanks. I am pretty happy myself. I was unsure about DRP since I couldnt get in touch with its community. Now I have :slightly_smiling_face: ll definitely go through the DRP as well as RackN components.

mohd.mehdim
2018-02-22 20:18
has joined #json

spector
2018-02-22 21:57
Welcome Mohammed Mehdi

mohd.mehdim
2018-02-22 22:59
Thanks Spector

spector
2018-02-22 23:00
There is a FAQ, I think this will bring it up for you $Welcome

2018-02-22 23:00
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

mohd.mehdim
2018-02-22 23:03
Yeah I have been doing some reading about digital rebar for the last one week. Trying to get this up and running in virtualbox.

shane
2018-02-22 23:04
VirtualBox is a bit of a pain - because it tries to interfere w/ your DHCP services, and you may have some other issues if you're on a Mac

mohd.mehdim
2018-02-22 23:04
yeah I am currently running into dhcp issues on my mac


mohd.mehdim
2018-02-22 23:08
Yeah trying to use hostonly network on both dr and client but for some reason its getting dhcp from somewhere else. Disabled virtualbox dhcp along with docker daemon.

shane
2018-02-22 23:09
for my MAC - I did this: 1. setup a single VM as my DRP Endpoint, w/ 2 NIC a) 1st Bridged to my WiFi (or your local LAN) b) 2nd as Host-Only (using vboxnet0) - this way, my Mac can participate in connecting to the VMs (Machines) directly too 2. set up my VMs with only 1 NIC, connected to Host-Only (vboxnet0) 3. on the DRP Endpoint, turn on packet forwarding (routing), and add IPTables NAT rules 4. disable DHCP on vboxnet0 - and then KILL the DHCP server that doesn't stop when you disable

shane
2018-02-22 23:09
when you disable it - it does NOT stop the DHCP service - you have to kill it, after disabling it

mohd.mehdim
2018-02-22 23:09
aha..need to kill the dhcp service...probably thats it, otherwise my setup is exactly the same as yours

shane
2018-02-22 23:12
I also added an IP addr to my Mac vboxnet0 network in the VirtualBox management tool - static IP assignment

mohd.mehdim
2018-02-22 23:16
ok so it gets an ip from dr but getting error ```Exec format error```

mohd.mehdim
2018-02-22 23:17
looks like some configuration issue with sledgehammer

mohd.mehdim
2018-02-22 23:18

shane
2018-02-22 23:22
hmm - not sure off the top of my head - I just upgraded my DRP version to the latest v3.7.0 (incidentally - you were the FIRST to download it - mere seconds after it released ... )

shane
2018-02-22 23:22
I was able to boot a VM smoothly

shane
2018-02-22 23:23
did you use the `--static-ip` flag when you started DRP ?

mohd.mehdim
2018-02-22 23:23
yeah

mohd.mehdim
2018-02-22 23:23
oh this is brand new :sunglasses:

shane
2018-02-22 23:28
can you please copy-n-paste the process listing/options for your running DRP Endpoint ? (`ps -ef | grep dr-provision | grep -v grep`)

mohd.mehdim
2018-02-22 23:32
```root 1510 1442 0 17:58 pts/0 00:00:00 sudo ./dr-provision --static-ip=192.168.99.201 --base-root=/root/drp-data --local-content= --default-content= root 1512 1510 0 17:58 pts/0 00:00:01 ./dr-provision --static-ip=192.168.99.201 --base-root=/root/drp-data --local-content= --default-content=```

shane
2018-02-22 23:33
did you create a `subnet` for 192.168.99.0/24 ?

mohd.mehdim
2018-02-22 23:34
yes

shane
2018-02-22 23:34
can you please provide the output of `drpcli subnets show <NAME _OF_SUBNET>` ?

shane
2018-02-22 23:34
we have some new changes that make adding a subnet not necessary - but I'd like to inspect what you set there

shane
2018-02-22 23:35
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F9CSHS532/screen_shot_2018-02-22_at_15.35.03.png and commented: this is my vbox VM booting up against a brand new v3.7.0 DRP endpoint

shane
2018-02-22 23:40
Ok - I replicated the issue - but only be deleting my Subnet - and I get the same errors you see when I do that

zehicle
2018-02-22 23:41
@mohd.mehdim which UX URL are you using?


shane
2018-02-22 23:45
:slightly_smiling_face: I can't access your DRP Endpoint via the Portal - I have to have direct access to the 192.168.99.201 IP address for that to work

mohd.mehdim
2018-02-22 23:45
```{ "ActiveEnd": "192.168.99.254", "ActiveLeaseTime": 60, "ActiveStart": "192.168.99.250", "Available": true, "Description": "", "Enabled": true, "Errors": [], "Meta": {}, "Name": "local_subnet", "NextServer": "", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.255.0" }, { "Code": 3, "Value": "192.168.99.1" }, { "Code": 6, "Value": "8.8.8.8" }, { "Code": 15, "Value": "http://example.com" }, { "Code": 28, "Value": "192.168.99.255" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "Proxy": false, "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "192.168.99.0/24", "Unmanaged": false, "Validated": true }```

mohd.mehdim
2018-02-22 23:46
yeah that ip is on hostonly network

shane
2018-02-22 23:46
that's a security mechanism - DRP Endpoint NEVER talks to the Portal directly - it only talks to the management workstation (aka your laptop) ... and provides a "passthrough" connection between the Portal and the Endpoint via the single-page React application that's running in your browser

mohd.mehdim
2018-02-22 23:48
So, the above is indeed the UX URL right?

shane
2018-02-22 23:48
yes - that's the Production version of the UX (our Stable Portal)

shane
2018-02-22 23:50
Ok - you need to edit the Subnet you created, add add the Param for Option Code 67, and set it to the value "lpxelinux.0"

shane
2018-02-22 23:50
`drpcli subnets set local_subnet option 67 to "lpxelinux.0"`

shane
2018-02-22 23:51
the DHCP changes in v3.7.0 release have made it necessary to now add that - I'll update the quickstart doc right now

mohd.mehdim
2018-02-22 23:53
cool..let me try that

mohd.mehdim
2018-02-22 23:55
Works :+1:

shane
2018-02-22 23:55
woot! woot!

mohd.mehdim
2018-02-22 23:58
So, can we use static names for machine names and use it to assign bootenv?

shane
2018-02-23 00:01
yep - just Edit the machine and change the name

shane
2018-02-23 00:01
you can do that in the UX, or via `drpcli` - see $FAQ for a FAQ note on that


mohd.mehdim
2018-02-23 00:02
I am more of a cli guy and will automate this provisioning using ansible

shane
2018-02-23 00:03
the FAQ has the CLI option there


shane
2018-02-23 00:04
We have some limited support for Ansible playbooks - we demonstrated it using the Kubernetes Kubespray Ansible playbook


shane
2018-02-23 00:04
note: it's been a while since we exercised that code - so it may be a bit crusty

mohd.mehdim
2018-02-23 00:07
cool?let me look into it?thanks

mohd.mehdim
2018-02-23 00:08
Another question, I see that DR does has the capability of doing hardware raid. Does it support any vendor or specific vendors?

shane
2018-02-23 00:09
we do - but that is a RackN commercial piece - if you're interested in that we should discuss that in context of the Trial you're running

greg
2018-02-23 01:46
- The release is out!

greg
2018-02-23 01:47
DRP stable is now v3.7.0. Content packages are v1.6.0. Plugins are now at v2.0.0. UX is now at v1.0.0.

greg
2018-02-23 01:47
If you update to stable, please immediately update your plugins to their v2.0.0 counterparts.

greg
2018-02-23 01:47
Updating default content will require a sledgehammer update.

greg
2018-02-23 01:48

greg
2018-02-23 01:49
UX release notes:


zehicle
2018-02-23 02:33
:nerd_face: great news. This is a big release w plugins, lots of bug fixes, dhcp and other updates. Well done!!

dave.parker
2018-02-23 18:32
Huh.

dave.parker
2018-02-23 18:32
I just installed the new stable, and when I try to do `drpcli bootenvs uploadiso sledgehammer` I get `Error: GET: bootenvs/sledgehammer: Not Found`

dave.parker
2018-02-23 18:34
Oh, this is probably why:

dave.parker
2018-02-23 18:34
```Installing Version stable of Digital Rebar Provision Community Content Failed to dowload content. Failed to download sha of content. sha256sum: drp-community-content.yaml: No such file or directory drp-community-content.yaml: FAILED open or read sha256sum: WARNING: 1 listed file could not be read```

dave.parker
2018-02-23 18:37
Going into the GUI under "Info & Preferences" and clicking the "Content" link under System Wizard and manually transferring community-content and community-contrib fixed it.

greg
2018-02-23 18:39
how did you install?

greg
2018-02-23 18:39
I found this and I?m fixing it shortly, I think.

greg
2018-02-23 18:39
Actually, it is in tip/tools/install.sh

greg
2018-02-23 18:40
I?ll crank a 3.7.1 with a couple of fixes here shortly.

dave.parker
2018-02-23 18:40
`curl -fsSL get.rebar.digital/stable | bash -s -- --isolated install`

greg
2018-02-23 18:41
hmm - when did you do the install?

greg
2018-02-23 18:42
just probably means now. :slightly_smiling_face:

dave.parker
2018-02-23 18:42
About ten minutes ago?

greg
2018-02-23 18:42
okay - thinking . It worked for me.

greg
2018-02-23 18:42
oh - I wonder if the tree was gyrating because of what I was doing. maybe.

greg
2018-02-23 18:43
Your recovery method was and is sound though.

dave.parker
2018-02-23 18:43
Cool

dave.parker
2018-02-23 20:14
Reinstalled tip just a bit ago and had the problem again. Don't know why it doesn't work for me.

greg
2018-02-23 20:15
hmmm - okay

greg
2018-02-23 20:16
@dave.parker - can do it two steps for me.

greg
2018-02-23 20:16
`curl -fsSL get.rebar.digital/stable > install.sh`

greg
2018-02-23 20:16
`chmod +x install.sh`

greg
2018-02-23 20:16
edit to add `set -x` near the top.

greg
2018-02-23 20:16
`./install.sh --isolated install`

dave.parker
2018-02-23 20:17
Sure I can try that.

greg
2018-02-23 20:17
Send me the output

greg
2018-02-23 20:17
please

dave.parker
2018-02-23 20:20

greg
2018-02-23 20:20
it seemed to work for you that time.

dave.parker
2018-02-23 20:20
Huh

dave.parker
2018-02-23 20:21
Yeah, sure did. I can grab sledgehammer just fine.

dave.parker
2018-02-23 20:21
Maybe it's some weird vagrant thing. Although I've got it to work in vagrant before.

greg
2018-02-23 20:21
It could be a networking path issue at times. :neutral_face:

greg
2018-02-23 20:21
hmmm

dave.parker
2018-02-23 20:22
I did manually try to reinstall once (not through vagrant up or vagrant provision, just going on the box and running the curl/install command) and that failed at that time.

dave.parker
2018-02-23 20:22
Well, let me try again.

greg
2018-02-23 20:23
I?m trying to help, but not sure what is going on. The thing I fixed this morning was if you specified `--version=v3.7.0`. It would fail for certain. You aren?t doing that so it is some other problem.

dave.parker
2018-02-23 20:23
Ok.

greg
2018-02-23 20:24
Stable shouldn?t be moving so the files should be there.

greg
2018-02-23 20:24
Sometimes tip can fail this way if you catch it during an update. The files are changing, but this isn?t that either.

dave.parker
2018-02-23 20:27
It certainly doesn't help that my laptop keeps crashing.

dave.parker
2018-02-23 20:27
Ok I'm trying again with stable through vagrant.

dave.parker
2018-02-23 20:27
See if that works now.

dave.parker
2018-02-23 20:33
Huh, nope. Still doesn't work.

dave.parker
2018-02-23 20:33
That's really weird.

greg
2018-02-23 20:33
yeah - does the direct curl work? `curl -sfL -o drp-community-content.sha256 https://github.com/digitalrebar/provision-content/releases/download/stable/drp-community-content.sha256`

dave.parker
2018-02-23 20:36
It does

dave.parker
2018-02-23 20:43
It works when I run from the command line though.

dave.parker
2018-02-23 20:43
Something is screwy with the way it installs during vagrant provisioning.

dave.parker
2018-02-23 20:51
Ok I don't get this. Because I see this in the install: ```master: Installing Version stable of Digital Rebar Provision Community Content master: drp-community-content.yaml: OK```

dave.parker
2018-02-23 20:51
But it still fails to find sledgehammer?

greg
2018-02-23 21:01
how are you running the dr-provision?

dave.parker
2018-02-23 21:07
A shell script that the vagrant provisioner runs.

dave.parker
2018-02-23 21:07
```#! /bin/bash mkdir dr-prov chown vagrant:vagrant dr-prov cd dr-prov curl -fsSL get.rebar.digital/tip | bash -s -- --isolated install nohup sudo ./dr-provision --static-ip=192.168.50.10 --base-root=/home/ubuntu/dr-prov/drp-data --local-content="" --default-content="" & &> /dev/null sleep 10 chown -R vagrant:vagrant . ./drpcli bootenvs uploadiso sledgehammer ./drpcli bootenvs uploadiso ubuntu-16.04-install ./drpcli subnets create - < /tmp/subnet.json ./drpcli bootenvs create - < /tmp/ubuntu1404.json ./drpcli prefs set unknownBootEnv discovery defaultBootEnv sledgehammer defaultStage discover```

dave.parker
2018-02-23 21:08
I just added the chown stuff. The first one seemed to fix the problem of the files not downloading at all, but it still doesn't seem to load properly.

dave.parker
2018-02-23 21:09
```master: Installing Version stable of Digital Rebar Provision Community Content master: drp-community-content.yaml: OK master: # Run the following commands to start up dr-provision in a local isolated way. master: # The server will store information and serve files from the drp-data directory. master: sudo ./dr-provision --static-ip=10.0.2.15 --base-root=/home/vagrant/dr-prov/drp-data --local-content="" --default-content="" & master: master: # Once dr-provision is started, these commands will install the isos for the community defaults master: ./drpcli bootenvs uploadiso ubuntu-16.04-install master: ./drpcli bootenvs uploadiso centos-7-install master: ./drpcli bootenvs uploadiso sledgehammer master: dr-provision2018/02/23 20:57:55.488386 Version: v3.7.0-0-246bbac639d47f8302fdfd4642646aeb498f9d0c master: dr-provision2018/02/23 20:57:55.492666 Extracting Default Assets master: dr-provision2018/02/23 20:57:56.081949 Starting TFTP server master: dr-provision2018/02/23 20:57:56.082273 Starting static file server master: dr-provision2018/02/23 20:57:56.082411 Starting DHCP server master: dr-provision2018/02/23 20:57:56.084521 Starting PXE/BINL server master: dr-provision2018/02/23 20:57:56.084848 Starting API server master: dr-provision2018/02/23 20:58:05.548566 [2:1]frontend [audit]: /home/travis/gopath/src/github.com/digitalrebar/provision/frontend/frontend.go:642 master: [2:1]Authenticated rocketskates - users token rocketskates - 127.0.0.1 master: Error: GET: bootenvs/sledgehammer: Not Found master: dr-provision2018/02/23 20:58:05.639968 [6:2]frontend [audit]: /home/travis/gopath/src/github.com/digitalrebar/provision/frontend/frontend.go:642 master: [6:2]Authenticated rocketskates - users token rocketskates - 127.0.0.1 master: Error: GET: bootenvs/ubuntu-16.04-install: Not Found```

dave.parker
2018-02-23 21:10
I'm going to try running the script manually from the command line in the vagrant box.

dave.parker
2018-02-23 21:12
It fails that way too. But the manual curl install worked. So I guess I'll go step by step until I find out what's not working...

greg
2018-02-23 21:16
this seems like you aren?t finding the content in the drp-data directory.

greg
2018-02-23 21:17
check - `/home/vagrant/dr-prov/drp-data/saas-content` for files

greg
2018-02-23 21:17
@dave.parker - more info ^

dave.parker
2018-02-23 21:17
Hrm ok

dave.parker
2018-02-23 21:20
```vagrant@dr-prov:~/dr-prov/drp-data/saas-content$ ls -la total 88 drwxrwxr-x 2 vagrant vagrant 4096 Feb 23 21:16 . drwxrwxr-x 9 vagrant vagrant 4096 Feb 23 21:16 .. -rw-rw-r-- 1 vagrant vagrant 79008 Feb 23 21:16 default.yaml```

dave.parker
2018-02-23 21:28
Huh, now it's not working with the curl command manually either.

dave.parker
2018-02-23 21:28
I'm stumped then.

dave.parker
2018-02-23 21:28
I'm going to try a non-vagrant machine.

zehicle
2018-02-23 21:29
@dave.parker if you are trying to build a system w/ DRP content staged, you can do it by populating the directory structure directly instead of using the APIs

dave.parker
2018-02-23 21:59
It seems to be something with vagrant. It's working fine installed on a hand-built host.

greg
2018-02-23 22:00
whew! I guess. :disappointed:

amit.handa
2018-02-24 07:16
install.sh is taking ages to download dr-provision.zip :disappointed:

amit.handa
2018-02-24 07:16
40KBPS max speed

amit.handa
2018-02-24 07:16
unable to upgrade

amit.handa
2018-02-24 07:17
any ideas on speeding up

greg
2018-02-24 14:37
Hmm. Not sure. It is in an s3 bucket.

zehicle
2018-02-24 18:18
FWIW - I've tried get Vagrant working on multiple iterations of Rebar with limited success, but I'd been doing it on Linux desktops.

zehicle
2018-02-24 19:50
Greg and I recorded a video set about creating and bundling content. The example content is on http://github.com/digitalrebar/colordemo and the videos posted: * Creating Content: https://youtu.be/79Y-3IOguZk * Bundling: https://youtu.be/JUyzFNkLyZU

michael.harp
2018-02-26 16:52
http://provision.readthedocs.io/en/latest/doc/quickstart.html#install-your-first-machine Step-4 has typo, stages->stage `drpcli machines stage <UUID> ubuntu-16.04-install`

greg
2018-02-26 16:59
Thanks - fixing now.

florent.wagener
2018-02-26 18:31
is there any release notes for the 3.7.0 version ? I have trouble finding them...


florent.wagener
2018-02-26 18:40
ill give it a try this afternoon.

dave.parker
2018-02-26 19:04
3.7 has been pretty great to me so far.

greg
2018-02-26 19:56
I pushed a new release v3.7.1 - https://github.com/digitalrebar/provision/releases/tag/v3.7.1 Most this is doc updates, a bug fix, and some Mac OSX and virtualbox ease of use stuff. Content updates as well, but small too. Some new icon and color update. A couple of helper templates.

greg
2018-02-26 19:57
Also, docker containers are updated and versioned now as well. stable, latest, and v3.7.1 are out there.

rakeshrhcss
2018-02-27 11:08
has joined #json

wdennis
2018-02-27 16:12
@shane or anyone else - what is correct cmd line to upgrade from `v3.6.0-tip` to 3.7 stable?


shane
2018-02-27 16:16
(sorry I pasted `tip` doc - use the `latest`)

shane
2018-02-27 16:16
we put version to version upgrade notes there

shane
2018-02-27 16:16
if there is nothing, then you should assume there are no special requirements for your upgrade path

shane
2018-02-27 16:17
for your case ... "it depends" ... on what version of `3.6.0-tip` you were at

shane
2018-02-27 16:17
but the 3.6.0 to 3.7.0 notes will give you good results - eg update your plugins

wdennis
2018-02-27 16:19
Seeing this right after upgrade and start - but before upgrading plugins: ```dr-provision2018/02/27 11:56:01.273119 Version: v3.7.1-0-b441dd1450c98be5317025c89668f85985eb65d8 dr-provision2018/02/27 11:56:01.273263 Extracting Default Assets dr-provision2018/02/27 11:56:02.246097 [0:1]frontend [ info]: /home/travis/gopath/src/github.com/digitalrebar/provision/frontend/frontend.go:512 [0:1]Running Local UI from /home/dradmin/drp/drp-data/ux dr-provision2018/02/27 11:56:02.430453 [0:2]plugin [error]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/controller.go:523 [0:2]Unpack for ipmi failed: exit status 1 dr-provision2018/02/27 11:56:02.430578 [0:3]plugin [error]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/controller.go:524 [0:3]Error: unknown command "unpack" for "ipmi" Run 'ipmi --help' for usage. dr-provision2018/02/27 11:56:03.448196 Starting TFTP server dr-provision2018/02/27 11:56:03.449430 Starting static file server dr-provision2018/02/27 11:56:03.449786 Starting API server```

wdennis
2018-02-27 16:19
The `ipmi` failure is due to old plugin?

shane
2018-02-27 16:19
yep - you had plugins v1

shane
2018-02-27 16:20
you'll need to follow the 3.6.0 to 3.7.0 notes on updating your plugins to the new v2 goodness

wdennis
2018-02-27 16:20
No, just came from v2 plugins w/ 2.6.0-tip actually

wdennis
2018-02-27 16:20
2->3

shane
2018-02-27 16:20
(yep)

wdennis
2018-02-27 16:21
OK, did ipmi plugin upgrade, let's restest start...

wdennis
2018-02-27 16:21
Yup, clean start now

wdennis
2018-02-27 16:22
Need new sledgehammer, correct?

shane
2018-02-27 16:22
yes

shane
2018-02-27 16:24
not sure if you noticed - but you can use the version inspector with colorized (pretty-print) and Diff capability in the UX

shane
2018-02-27 16:24
you can see exactly what's going to be changed before performing an upgrade of Content

wdennis
2018-02-27 16:25
Is there an example in docs for downloading boot isos?

shane
2018-02-27 16:26
$quickstart has it

2018-02-27 16:26

wdennis
2018-02-27 16:27
cool, thx

shane
2018-02-27 16:27
no prob

shane
2018-02-27 16:28
I believe ... but may be lying to you ... that you can now do it through the "Boot ISOs" menu in the UX - I haven't tried that path yet

wdennis
2018-02-27 16:28
It downloads ISO to host running UX

shane
2018-02-27 16:28
ah

shane
2018-02-27 16:29
actually - I think the Upload action pushes ISO from your UX hosted management workstation

wdennis
2018-02-27 16:29
The `drpcli bootenvs uploadiso` does the explode etc right?

shane
2018-02-27 16:29
so if you have the ISO local - then you can push it to your endpoint from workstation

shane
2018-02-27 16:29
yep

wdennis
2018-02-27 16:29
OK

shane
2018-02-27 16:30
From the Boot Environments menu - you can select a BootEnv, which will have the source of the ISO/tarball - and download to your workstation - then use the Boot ISOs Upload command

shane
2018-02-27 16:30
or - as you are doing - use the CLI uploadiso helper

shane
2018-02-27 16:32
yep - the 2-step UX procedure works too

wdennis
2018-02-27 16:32
OK, everything looks good...

shane
2018-02-27 16:32
excellent !

wdennis
2018-02-27 16:33
Trying a 5-node reinstall, let's see what happens

shane
2018-02-27 16:33
it'll magically work !

wdennis
2018-02-27 16:33
(bye-bye old KRIB cluster :cry:)

shane
2018-02-27 16:34
there's some new feedback stuff in the new KRIB update ... which gives you visual change cues

shane
2018-02-27 16:34
@zehicle is excited about it - I'm going to give it a try and see how it looks today

wdennis
2018-02-27 16:34
Yeah, have to install a test Rancher cluster today, but then want to go back to stock k8s

shane
2018-02-27 16:35
did you get rancher working via DRP ?

wdennis
2018-02-27 16:35
No, not RancherOS - just Rancher-controlled infra nodes

shane
2018-02-27 16:35
ah

wdennis
2018-02-27 16:35
Kicking tires on their k8s installer

shane
2018-02-27 17:36
we hope to see you all in a short bit (11 am PST) for our v012 meetup. Meetup link: https://www.meetup.com/digitalrebar/events/247773442/

amontalban
2018-02-27 18:08
Hey guys, anyone had to use full disk encryption on Ubuntu setting the encryption key in preseed? Anyway to automate random key generation with DigitalRebar or interact with Vault for it?

shane
2018-02-27 18:10
hi @amontalban - we haven't specifically done any FDE w/ DRP - nor specific Vault integration ... however, it should be pretty easy to author/change content to use the Vault command line to interact with your Vault store

amontalban
2018-02-27 18:10
Yeah, I think I will go that route

amontalban
2018-02-27 18:10
Thanks!

shane
2018-02-27 18:12
also `openssl` is installed by default in the Sledgehammer image - so you can use `openssl rand ...` to generate a number, which can be used also in a Stage

shane
2018-02-27 18:13
(eg `openssl rand -hex 100` to generate a 100 character random string)

amontalban
2018-02-27 18:14
Great thank you!

shane
2018-02-27 18:15
you can also use `drpcli` to store a Param on the machine w/ the randomly generated number, which can subsequently be used in your Stage(s) for the seed value

dave.parker
2018-02-27 18:59
Did anybody ever figure out what bizarre thing Vagrant is doing that doesn't play nice with dr-provision? I know @zehicle said something about having tried it with limited success.

dave.parker
2018-02-27 19:00
I gave up on it but would still love to get it working. It'd be ideal for me to be able to pass a kind of playground around for people to get familiar with.

greg
2018-02-27 19:04
virtualbox had issues with lpxelinux.

greg
2018-02-27 19:04
that has been fixed in 3.7.1

greg
2018-02-27 19:04
I don?t remember the other issues in vagrant.

shane
2018-02-27 19:05
@dave.parker we were going to discuss that in the meetup that is starting RIGHT NOW

dave.parker
2018-02-27 19:11
Oh, I thought that was hours ago. Time zones are hard. Either way I can't jump on right now. :disappointed:

spector
2018-02-27 19:21
We record the meetups, will be online in an a few hours from now

dave.parker
2018-02-27 20:01
Oh cool.

spector
2018-02-27 20:48
@dave.parker http://bit.ly/2BSVRsq -> almost done processing on YouTubve

dave.parker
2018-02-27 21:07
:thumbsup:

florent.wagener
2018-02-27 21:41
does sledgehammer support python3.x ? If not is this something on the roadmap ?

wdennis
2018-02-27 21:42
Hi team - what would a stage-map look like that has `prep-install` as an step before an OS install?

wdennis
2018-02-27 21:43
(hoping that `prep-install` fixes install prob's with pre-existing disks that were used for LVM)

vlowther
2018-02-27 21:43
@florent.wagener Not out of the box -- 2.7 is present, IIRC -- it is the default for centos7 still.

florent.wagener
2018-02-27 21:43
thanks @vlowther

vlowther
2018-02-27 21:44
@wdennis discover -> prep-install -> foo-install -> local

wdennis
2018-02-27 21:46
@vlowther There will be a reboot after `prep-install` I reckon?

vlowther
2018-02-27 21:47
nope.

vlowther
2018-02-27 21:47
that is, the task won't reboot the system.

vlowther
2018-02-27 21:48
and the runner will automatically reboot into the right bootenv for the next stage.

wdennis
2018-02-27 21:49
So, this should work? ``` "change-stage/map": { "discover": "prep-install:Success", "prep-install": "ubuntu-16.04-install:Success", "ssh-access": "complete-nowait:Reboot", "ubuntu-16.04-install": "ssh-access:Success" }, ```

greg
2018-02-27 21:51
``` "change-stage/map": { "discover": "prep-install:Success", "prep-install": "ubuntu-16.04-install:Reboot", "ssh-access": "complete-nowait:Stop", "ubuntu-16.04-install": "ssh-access:Success" }, ```

greg
2018-02-27 21:51
Reboot after changing to the install stage.

wdennis
2018-02-27 21:51
ah

greg
2018-02-27 21:51
Stop the task process when the post-install stages are done.

wdennis
2018-02-27 21:54
OK, got it... ``` "change-stage/map": { "discover": "prep-install:Success", "prep-install": "ubuntu-16.04-install:Reboot", "ssh-access": "complete-nowait:Stop", "ubuntu-16.04-install": "ssh-access:Success" } ```

greg
2018-02-27 21:56
also you don?t need the ssh-access steps. It is a built-in task to the ubuntu-16.04-install

clint
2018-02-27 22:01
has joined #json

wdennis
2018-02-27 22:15
Hmmm, prep-install not blanking the install disk - still getting this when trying to reuse prior-installed disks:

wdennis
2018-02-27 22:16
I can see from the jobs log that the tasks ran before the Ubuntu install task...


wdennis
2018-02-27 22:18
This is sample job output... ```Log for Job: 36f0b70e-055a-40c8-a288-0a224eb94343 Starting task erase-hard-disks-for-os-install on bc00245d-48e8-487a-9b5c-e59eb2b62f8d Starting command ./erase-hard-disks-for-os-install-erase-disks Command running PARTIAL MODE. Incomplete logical volumes will be processed. Reading all physical volumes. This may take a while... Found volume group "testnode01" using metadata type lvm2 Logical volume testnode01/root contains a filesystem in use. PV /dev/sda3 VG testnode01 lvm2 [1.82 TiB / 0 free] Total: 1 [1.82 TiB] / in use: 1 [1.82 TiB] / in no VG: 0 [0 ] PV /dev/sda3 belongs to Volume Group testnode01 so please use vgreduce first. (If you are certain you need pvremove, then confirm by using --force twice.) mdadm: Unrecognised md component device - /dev/sda3 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0144598 s, 72.5 MB/s 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0566154 s, 18.5 MB/s mdadm: Unrecognised md component device - /dev/sda2 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0237597 s, 44.1 MB/s 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0618786 s, 16.9 MB/s mdadm: Unrecognised md component device - /dev/sda1 1024+0 records in 1024+0 records out 524288 bytes (524 kB, 512 KiB) copied, 0.0163394 s, 32.1 MB/s mdadm: Unrecognised md component device - /dev/sda 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0104949 s, 99.9 MB/s 2048+0 records in 2048+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0346949 s, 30.2 MB/s mdadm: Couldn't open /dev/sr0 for write - not zeroing dd: failed to open '/dev/sr0': No medium found dd: failed to open '/dev/sr0': No medium found Command exited with status 0 Action erase-disks finished Task erase-hard-disks-for-os-install finished Updated job 36f0b70e-055a-40c8-a288-0a224eb94343 to finished ```

vlowther
2018-02-27 22:28
Guess we need the --force --force --really-i-mean-it flag. :confused:

dave.parker
2018-02-27 22:43
--run --go --get-to-the-choppa

zehicle
2018-02-27 23:32
hello @clint $welcome

2018-02-27 23:32
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

clint
2018-02-27 23:32
Thanks!

amit.handa
2018-02-28 11:29
quick ques: I have a local sledgehammer tarball (from s3) which I want to deploy to another drp-instance (due to version upgrade).

amit.handa
2018-02-28 11:29
so I copy the tarball to dest instance.

amit.handa
2018-02-28 11:33
get the bootenv for 'sledgehammer'

amit.handa
2018-02-28 11:33
update the isoUrl param to point to local disk path for the tarball

amit.handa
2018-02-28 11:33
and do bootenvs update sledgehammer - < updatedbootenv.list

amit.handa
2018-02-28 11:34
I get "Error: PATCH: discovery"

amit.handa
2018-02-28 11:34
no logs on the server side as well

amit.handa
2018-02-28 11:34
had run the drp server with trace log-level

amit.handa
2018-02-28 11:35
am I doing it correctly ?

amit.handa
2018-02-28 11:35
thanks

greg
2018-02-28 13:54
Login to the portal and update the content package for the community.

greg
2018-02-28 13:55
Then from the cli run the uploadiso command from the QuickStart. The community video has Shane talking about this some yesterday @amit.handa

wdennis
2018-02-28 16:19
Using the default DRP-provided preseed partitioning map, getting a failure as so:

wdennis
2018-02-28 16:23
(Had wiped the target disk beforehand with `dd if=/dev/zero of=/dev/sda bs=1024M` )


wdennis
2018-02-28 16:25
Here is the partitioning recipe that is being used: ``` #Partitioning Scheme d-i partman-auto/disk string /dev/sda d-i grub-installer/choose_bootdev select /dev/sda d-i grub-installer/bootdev string /dev/sda d-i partman-auto/method string lvm d-i partman-auto-lvm/guided_size string max d-i partman-auto-lvm/new_vg_name string testnode02 d-i partman-auto/choose_recipe select custom_lvm d-i partman-auto/expert_recipe string \ custom_lvm:: \ 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \ 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \ 10240 20 10240 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ testnode02 } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ testnode02 } method{ swap } format{ } . d-i grub-installer/only_debian boolean true d-i partman/confirm_write_new_label boolean true d-i partman/choose_partition select finish d-i partman/confirm boolean true d-i partman/confirm_nooverwrite boolean true ```

greg
2018-02-28 16:26
Is that the default scheme file as well?

wdennis
2018-02-28 16:27
I believe so, but let me dbl-check...

wdennis
2018-02-28 16:30
part-scheme-default.tmpl:``` {{if .ParamExists "operating-system-disk" -}} d-i partman-auto/disk string /dev/{{.Param "operating-system-disk"}} d-i grub-installer/choose_bootdev select /dev/{{.Param "operating-system-disk"}} d-i grub-installer/bootdev string /dev/{{.Param "operating-system-disk"}} {{else -}} d-i partman-auto/disk string /dev/sda d-i grub-installer/choose_bootdev select /dev/sda d-i grub-installer/bootdev string /dev/sda {{end -}} d-i partman-auto/method string lvm d-i partman-auto-lvm/guided_size string max d-i partman-auto-lvm/new_vg_name string {{.Machine.ShortName}} d-i partman-auto/choose_recipe select custom_lvm d-i partman-auto/expert_recipe string \ custom_lvm:: \ 500 50 1024 free $iflabel{ gpt } $reusemethod{ } method{ efi } format{ } . \ 128 50 256 ext2 $defaultignore{ } method{ format } format{ } use_filesystem{ } filesystem{ ext2 } mountpoint{ /boot } . \ 10240 20 10240 ext4 $lvmok{ } mountpoint{ / } lv_name{ root } in_vg{ {{.Machine.ShortName}} } method{ format } format{ } use_filesystem{ } filesystem{ ext4 } . \ 50% 20 100% linux-swap $lvmok{ } lv_name{ swap } in_vg{ {{.Machine.ShortName}} } method{ swap } format{ } . d-i grub-installer/only_debian boolean true ```

wdennis
2018-02-28 16:30
So, yes

greg
2018-02-28 16:30
okay - I?ll try it here in a might.

wdennis
2018-02-28 16:31
These are reinstalls on previously-used disks; that's why I did the wipe, as they had LVM prior

wdennis
2018-02-28 16:31
So not they should be "blank" disks

wdennis
2018-02-28 16:31
not --> now

ghabian
2018-02-28 18:20
has joined #json

wdennis
2018-02-28 22:51
@greg You try an Ubuntu install yet?

greg
2018-02-28 22:59
Fighting other fires.

greg
2018-02-28 22:59
Will get to it tonight after church and soccer practice.

spector
2018-02-28 23:00
hello @ghabian $welcome

2018-02-28 23:00
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

wdennis
2018-02-28 23:00
@greg ACK

ghabian
2018-03-01 01:27
Thanks for the welcome!

amit.handa
2018-03-01 08:26
virtualbox VM is not PXE booting from DRP (bootenv: sledgehammer). after tftp'ing lpxelinux.0. VM network logs (wireshark) show following error: 477 0.378296662 10.10.20.76 10.10.31.96 TFTP 159 Error Code, Code: File not found, Message: open /var/lib/dr-provision/tftpboot/pxelinux.cfg/16089a59-9abd-48c2-850a-2ac3bc134935: no such file or directory

amit.handa
2018-03-01 08:27
I see that there is no such file in dr-provision file area

amit.handa
2018-03-01 08:27
what can be possible bug. I might have done something.

amit.handa
2018-03-01 08:27
Thanks

vlowther
2018-03-01 13:19
That is expected behavior for lpxelinux.0. see http://www.syslinux.org/wiki/index.php?title=PXELINUX#Configuration

vlowther
2018-03-01 14:52
Specifically, lpxelinux tries to fetch config files in a specific order. The first one os the DHCP client ID, which we don't use because (for servers or anything else that network boots) the client ID is a much worse unique identifier than the MAC address of the interface or the IP address the interface was assigned.

greg
2018-03-01 15:53
@wdennis - tip community content has a fix for ubuntu install.

greg
2018-03-01 15:54
You can use the current content tip with stable drp.

greg
2018-03-01 15:55
I also have PR in community content that moves all the partman options into the schema file. I know this will break community and haven?t pulled it in.

greg
2018-03-01 15:55
I also want to spend more time on it to reorg it a little more.

wdennis
2018-03-01 16:28
Thx @greg

wdennis
2018-03-01 16:30
I do think consolidating all partman directives into a single template is the sanest option... But do recognize the need to get current users on board with that change.

amit.handa
2018-03-02 12:47
thanks !

wdennis
2018-03-02 17:07
@greg Confirming new partman preseed directives in `tip` community content work now...

wdennis
2018-03-02 17:07
I got: ``` #Partitioning Scheme d-i partman-auto/disk string /dev/sda d-i grub-installer/choose_bootdev select /dev/sda d-i grub-installer/bootdev string /dev/sda d-i partman-auto/method string lvm d-i partman-auto-lvm/guided_size string max d-i partman-auto-lvm/new_vg_name string testnode01 d-i partman-auto/choose_recipe select atomic d-i grub-installer/only_debian boolean true d-i partman/confirm_write_new_label boolean true d-i partman/choose_partition select finish d-i partman/confirm boolean true d-i partman/confirm_nooverwrite boolean true ``` and it successfully partitioned the drive.

wdennis
2018-03-02 17:53
Refresh my memory - the string value of `select-kickseed` is the template name WITH or WITHOU the `.tmpl` at the end?

greg
2018-03-02 17:53
with

wdennis
2018-03-02 17:54
Ah, that's why it didn't work :stuck_out_tongue_winking_eye:

wdennis
2018-03-02 17:55
It's the part-scheme one that doesn't want the .tmpl

lae
2018-03-02 19:34
is it possible to have like a bootenv with some set templates and kernel parameters, but then also have several stages with an "extra" template and kernel parameter?

lae
2018-03-02 19:35
and while I was typing that out, it just hit me that I guess this is a scenario I could also solve with a drpcli runner, hm...

lae
2018-03-02 19:40
Is there a better way to get the bare DRP server IP or hostname other than parsing out `.Env.InstallUrl` or `.ProvisionerURL`?


rstarmer
2018-03-03 00:23
FYI, something is wrong in either the upstream, or more likley the ubuntu repo pointers: ``` drpcli bootenvs uploadiso ubuntu-16.04-install Error: Unable to initiate download of http://mirrors.kernel.org/ubuntu-releases/16.04/ubuntu-16.04.3-server-amd64.iso: 404 Not Found ```

rstarmer
2018-03-03 00:23
this was from a new install against stable.

shane
2018-03-03 00:24
@rstarmer checking it ...

shane
2018-03-03 00:24
we had an issue w/ CentOS yanking the 7.3 ISOs with zero warnings

rstarmer
2018-03-03 00:25
seems .3 ISO is gone.

shane
2018-03-03 00:25
also note ... if you have a copy of the ISO (Until we update the Contents) - you can install the ISO via the UX - go to the Boot ISOs menu item

shane
2018-03-03 00:26
or you can copy the ISO to your tftpboot/isos/ directory (either in ~/drp/drpdata, or the /var/lib/dr-provision directory)

shane
2018-03-03 00:26
then restart DRP

shane
2018-03-03 00:27
we'll put out an updated version of Contents shortly

rstarmer
2018-03-03 00:34
Ubuntu has the older ISOs here too for future reference: http://old-releases.ubuntu.com/releases/xenial/ubuntu-16.04.3-server-amd64.iso

shane
2018-03-03 00:34
yeah ... never mind using a symlink to point "latest" at the moving target ... sigh ...

rstarmer
2018-03-03 00:35
Question, or docs pointer if possible: The host I installed DR on has letsencrypt credentials, how do I tell DR who it actually is?

shane
2018-03-03 00:36
did you do `isolated` or production install ?

rstarmer
2018-03-03 00:36
production

rstarmer
2018-03-03 00:50
also, can I pass a domain name rather than IP address to the config?

shane
2018-03-03 00:53
@rstarmer there is a cert and key that gets installed (the self-signed). It's either in `/` or in `/var/lib/dr-provision/` directories. replace those with your certs and restart dr-provision

shane
2018-03-03 00:53
on the domain name - are you referring to `drpcli` commands ?

rstarmer
2018-03-03 00:56
no, I was thinking about advertised address from the server, I?d rather it passes out the domain name than the IP.

rstarmer
2018-03-03 00:56
was wondering if I should have passed a name with --static-ip=

shane
2018-03-03 00:56
via the `--static-ip` flag ?

rstarmer
2018-03-03 00:57
yeah, that?s what I was wondering :slightly_smiling_face:

shane
2018-03-03 00:57
ah - you really don't need `static-ip` we do a bunch of magic with caching address tables and interface info - and we dynamically serve the correct IP to a client based on their network connection

shane
2018-03-03 00:58
the problem w/ using a hostname/domain - you have to make sure that resolves at the PXE firmware/boot level ... and any DNS issues will impact your provisioning activities

shane
2018-03-03 00:58
you can of course hand out DNS servers w/ the DHCP assignment ... nothing stopping you from doing that ...

rstarmer
2018-03-03 00:58
ok, won?t worry there then.

shane
2018-03-03 00:58
but you'll be circumventing our magic logic

rstarmer
2018-03-03 00:59
so the key/cert are in /, this seems like a bad place to put these items. And in reality, I?d rather point to their locations in the letsencrypt directories (so that they stay up to date). IS there a config parameter I can set somewhere?

rstarmer
2018-03-03 00:59
no desire to bypass the magic!

shane
2018-03-03 01:00
```dr-provision --help --tls-key= The TLS Key File (default: server.key) --tls-cert= The TLS Cert File (default: server.crt)```

rstarmer
2018-03-03 01:00
k. will re-provision with those.

shane
2018-03-03 01:01
you can specify the location ... that's just the default

rstarmer
2018-03-03 01:23
so that doesn?t seem to be working. when I restart dr-provision it regenerates new self signed keys. Note I?m passing in .pem formatted key/cert, but I also don?t see any errors in trying to read them

shane
2018-03-03 01:24
@rstarmer I'm not sure if we accept a PEM format ... will have to check w/ @greg and/or @vlowther on that one

rstarmer
2018-03-03 01:24
Ok, I?m seeing the following: ``` curl -fsSL get.rebar.digital/stable | bash -s -- --tls-key=/etc/letsencrypt/live/gitlab.kumulus.co/privkey.pem --tls-cert=/etc/letsencrypt/live/gitlab.kumulus.co/cert.pem install Overriding TLS_KEY with /etc/letsencrypt/live/gitlab.kumulus.co/privkey.pem Overriding TLS_CERT with /etc/letsencrypt/live/gitlab.kumulus.co/cert.pem 'dr-provision' service is not running, beginning install process ... Ensuring required tools are installed Installing Version stable of Digital Rebar Provision ... ```

rstarmer
2018-03-03 01:28
started manually, passed the tls params, and that works.

rstarmer
2018-03-03 01:28
going to restart the service and see if it takes this time

greg
2018-03-03 01:28
You have to add to the service file

shane
2018-03-03 01:29
ah

rstarmer
2018-03-03 01:29
^which - where?

shane
2018-03-03 01:29
yeah - in `/etc/systemd/system/dr-provision.service` - assuming SystemD

greg
2018-03-03 01:29
The install.sh script doesn?t do anything. We need a plan for that

shane
2018-03-03 01:30
@rstarmer the content update w/ the 16.04.4 fix will be out later this evening the PR just needs to go through approval and release process now


rstarmer
2018-03-03 01:35
ok, trying that now. I?ll let you know if/as I succeed

rstarmer
2018-03-03 01:37
Yes, success, I had to update the /etc/systemd/system/dr-provision.service file: ``` [Service] ExecStart=/usr/local/bin/dr-provision --tls-key=/etc/letsencrypt/live/gitlab.kumulus.co/privkey.pem --tls-cert=/etc/letsencrypt/live/gitlab.kumulus.co/cert.pem ```

shane
2018-03-03 01:39
yay !

rstarmer
2018-03-03 01:54
now I?m stuck, the UI just spins on loading plugins

shane
2018-03-03 01:55
shift-reload

rstarmer
2018-03-03 02:41
helps a lot if I read the error message. gotta go find my glasses? (e.g. you must install providers before installing plugins?)

rstarmer
2018-03-03 06:29
interesting behavior with the terraform plugin, only the first node gets provisioned, and the system state doesn?t seem to get updated completely, leaving the one provisioned node in ?power off? state, thought it is running in packet.

rstarmer
2018-03-03 06:33
thoughts on how I might debug this?

stanchan740
2018-03-03 12:15
I have an Ansible role I wrote to setup dr-provision on alpine/debian/centos? just updated it to install 3.7.0, but post setup steps seem to be failing (setup a new admin user, drop the rocketskates user, setup all the preferences and profiles and setup the boot environments). Seems to be auth related? Did something change from 3.6.0? The UI seems to work fine.

zehicle
2018-03-03 14:28
can you share the script?

ced.hnyda
2018-03-03 17:26
has joined #json

spector
2018-03-03 17:32
hello @ced.hnyda $welcome

2018-03-03 17:32
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

stanchan740
2018-03-03 17:52

stanchan740
2018-03-03 17:53
I flipped it back to v3.6.0 and everything works? seems to be related to the new token auth system that is in place.

greg
2018-03-03 19:14
@stanchan740 - I?ll check but first pass is that the drp options changed

stanchan740
2018-03-03 20:45
Thanks for the tip? I?ll try looking at the options, but I believe I took all the defaults from dr-provision --help

greg
2018-03-03 20:48
actually, that isn?t it.

greg
2018-03-03 20:48
Sigh. I?m working on it now.

greg
2018-03-03 20:48
I hope to have a PR for you shortly.

stanchan740
2018-03-03 20:53
no rush?. thanks for the help greg!

stanchan740
2018-03-03 21:03
are there API docs somewhere for dr-provision? if I wanted to create a custom frontend for it?

greg
2018-03-03 21:03
yes, but that is that drpcli is.

greg
2018-03-03 21:04
$docs

greg
2018-03-03 21:04
$faq

greg
2018-03-03 21:04
$faq


greg
2018-03-03 21:04
That should be close. The nav should have API or something like that.

stanchan740
2018-03-03 21:05
swagger?

greg
2018-03-03 21:06
dang it. Something broke

greg
2018-03-03 21:06
something else to look at.

greg
2018-03-03 21:06
you can hit the endpoint with /swagger-ui

greg
2018-03-03 21:06
it will have a graphical UI

stanchan740
2018-03-03 21:06
found it in the docs :slightly_smiling_face:

stanchan740
2018-03-03 21:06
thanks

greg
2018-03-03 21:07

stanchan740
2018-03-03 21:07
yah? I just noticed that the v3.7.0 was just released not too long ago :slightly_smiling_face:

shane
2018-03-03 21:09
@stanchan the `drpcli` usage is a very very good tutor for the API - the CLI is dynamically generated from the API - so the resources closely follow the API resources ... that coupled with the Swagger-UI should give you the complete picture ...

stanchan740
2018-03-03 21:12
thanks @shane

shane
2018-03-03 21:13
and ... v3.7.2 will be released shorty fixing a few minor issues

greg
2018-03-03 21:13
already out

shane
2018-03-03 21:13
dang @greg - you's too fast !!

stanchan740
2018-03-03 21:15
how is logging handle? is there existing support for a prometheus endpoint? wanted to do something like opentracing against provisioning jobs and display it in something like jaeger or zipkin? just openly thinking :slightly_smiling_face:

greg
2018-03-03 21:16
umm - well - umm . that sounds cool. We don?t do that. You can grab events from a websocket stream.

greg
2018-03-03 21:16
or we can work with you on a plugin to push data as appropriate.

greg
2018-03-03 21:17
plugin to push to prometheus sounds plausible.

stanchan740
2018-03-03 21:18
:thumbsup: would be interested on working on something like that

greg
2018-03-03 21:18
What would ?a prometheus endpoint? need?

greg
2018-03-03 21:18
I haven?t looked at any of it. We can off-line it as well.

stanchan740
2018-03-03 21:19
that part is easy to implement? opentracing would be the interesting part

shane
2018-03-03 21:19
Plugin is definitely the best way, IMO ... but we also support Websocket events - so you can register for specific events via that standard method, details: http://provision.readthedocs.io/en/tip/doc/integrations/websocket.html#rs-websocket

stanchan740
2018-03-03 21:21
is 3.7.2 just release? it says 7 hours ago

greg
2018-03-03 21:21
yes

shane
2018-03-03 21:22
here's a websocket listener that can log out to prometheus https://github.com/closeio/socketshark

stanchan740
2018-03-03 21:26
prometheus is more for internal service metrics? distributed tracing is used for end to end transactions. all cncf projects.

stanchan740
2018-03-03 21:32
still returns an error

stanchan740
2018-03-03 21:33
``` 2018/03/03 13:31:16 &{403 Forbidden 403 HTTP/1.1 1 1 map[Date:[Sat, 03 Mar 2018 21:31:16 GMT] Content-Length:[0] Content-Type:[text/plain; charset=utf-8]] {} 0 [] false false map[] 0xc4201b6800 0xc4200a6370} ````

stanchan740
2018-03-03 21:33
I?ll revert back to v3.6.0 for now for testing

greg
2018-03-03 21:33
Your playbook needs a little tweaking.

stanchan740
2018-03-03 21:34
is this valid?

greg
2018-03-03 21:34
The problem is the password setting section of the playbook

stanchan740
2018-03-03 21:34
`RS_KEY=\"admin:password\""`

greg
2018-03-03 21:34
I?m fixing other things too

greg
2018-03-03 21:34
HMM - should be, but I?m not getting the admin password set.

stanchan740
2018-03-03 21:34
or should I switch to tokens

greg
2018-03-03 21:35
You can, but I?m almost there.

greg
2018-03-03 21:35
not sure why ```drpcli -U {{ provision_admin_user }} -P \"{{ provision_admin_password }}\" prefs list >/dev/null 2>&1 && exit 0 || drpcli users password {{ provision_admin_user }} \"{{ provision_admin_password }}\" && exit 99```

greg
2018-03-03 21:35
is not working correctly.

stanchan740
2018-03-03 21:35
oh? that part isn?t fixed yet :slightly_smiling_face:

stanchan740
2018-03-03 21:35
sorry

greg
2018-03-03 21:36
where are you getting the maperror thing?

stanchan740
2018-03-03 21:36
just running `drpcli users list`

greg
2018-03-03 21:37
okay - well if admin?s password isn?t set then it will have problems if you set RS_KEY=admin:password

greg
2018-03-03 21:37
don?t need the extra quotes

stanchan740
2018-03-03 21:37
ah

stanchan740
2018-03-03 21:37
see the issue

greg
2018-03-03 21:37
`export RS_KEY="admin:password"`

stanchan740
2018-03-03 21:37
it never changed the password :slightly_smiling_face:

greg
2018-03-03 21:37
right!!

greg
2018-03-03 21:55
I feel some more unit tests and a v3.7.3 coming on

greg
2018-03-03 21:56
Can?t set user passwords for some reason.

stanchan740
2018-03-03 22:16
yah? I see the same issue

stanchan740
2018-03-03 22:16
it responds back like it did something

greg
2018-03-03 22:17
yeah - in 3.7.0, to fix all the deadlocks in 3.6.0, we introduced system to prevent that. The password save is a special pass that we didn?t undo all the testing for.

stanchan740
2018-03-03 22:41
I?ll use the default password for now? I tried using the API to change the password without any luck

greg
2018-03-03 22:41
yeah - the cli and the API use the same backend path.

greg
2018-03-04 03:22
@stanchan740 - I fixed the user bug and cut a v3.7.3 release. I have a pull request against your tree that does quite a few changes and fixes. It seems to work for me.

stanchan740
2018-03-04 03:44
@greg Thanks! Looks good. Will merge. I added a few things for idempotence in my working branch. Decided to call the dr-provision API directly for that since it seems to be much more predictable.

stanchan740
2018-03-04 03:49
Really enjoy using dr-provision? much better and easier to use then cobbler! It just seems to work without much fiddling around. Will be working on an awx integration to do a tensorflow on top of kubernetes demo. Looking forward to digging into the code a bit more, too.

greg
2018-03-04 03:49
cool

zehicle
2018-03-04 04:28
@stanchan740 I looked at API integration w/ Tower (AWX) earlier. It would be a natural plugin to push machine updates into the AWX including when machines are online or not. Something to talk about 1x1

lae
2018-03-04 19:54
@stanchan I'm only skimming through chat but 3.7.0 deployed fine for me with the following: ``` - name: Configure DR Provision API user shell: "drpcli users create {{ provision_api_user }}" args: creates: "/var/lib/dr-provision/digitalrebar/users/{{ provision_api_user }}.json" - name: Update DR Provision API password shell: "drpcli -U {{ provision_api_user }} -P \"{{ provision_api_password }}\" prefs list >/dev/null 2>&1 && exit 0 || drpcli users password {{ provision_api_user }} \"{{ provision_api_password }}\" && exit 114" register: provision_password_update changed_when: provision_password_update.rc == 114 failed_when: provision_password_update.rc != 114 and provision_password_update.rc != 0 - name: Update shell environment with DR Provision API credentials copy: content: "#!/bin/bash\nexport RS_KEY=\"{{ provision_api_user }}:{{ provision_api_password }}\"" dest: "/etc/profile.d/dr-provision.sh" mode: 0755 - name: Remove default rocketskates user if different api_user is set shell: "drpcli users destroy rocketskates" args: removes: "/var/lib/dr-provision/digitalrebar/users/rocketskates.json" ```

lae
2018-03-04 19:54
well 3.7.1

lae
2018-03-04 19:56
(hadn't modified that since 3.2.0 or whatever was out last august)

greg
2018-03-04 20:17
@lae I think that would only work if you already the admin user created and password set. A fresh install wouldn?t work. Until 3.7.3

stanchan740
2018-03-04 20:41
yah? I usually create a new instance everytime I test a change

rakeshrhcss
2018-03-05 12:17
Hello all I am trying to install Centos 7 on one of our servers using our drp community content.. *It starts automated install but asks to create 1MB biosboot partition -> *Your BIOS-based system needs a special partition to boot from a GPT disk label. To continue, please create a 1MiB 'biosboot' type partition.*

rakeshrhcss
2018-03-05 12:18
so looks like we will have to modify our centos7 KS template to create biosboot partition part biosboot --fstype biosboot --size=1 {{if .ParamExists "operating-system-disk"}}--ondisk={{.Param "operating-system-disk"}}{{end}}

rakeshrhcss
2018-03-05 12:18
Any suggestions please.

rakeshrhcss
2018-03-05 12:24
And after some research I found that when exactly the biosboot partition is required - did the system boot in EFI mode or BIOS mode? - EFI - use gpt and never make biosboot - BIOS - is the disk larger than the max for msdos (2TB)? - yes - use gpt and ensure there's a biosboot partition - no - use msdos

rakeshrhcss
2018-03-05 12:25
So in our preinstall script (ks) we should be looking for the EFI capabilities and the disk size.

rakeshrhcss
2018-03-05 12:26
Please let me know your suggestions on this. Thanks.

greg
2018-03-05 14:08
@rakeshrhcss - I?m guessing your system has disks that are greater than 2TB in size.

greg
2018-03-05 14:09
My guess is for the time being, you will need to create a custom install bootenv with your own partitioning layout.

greg
2018-03-05 14:10
until someone can get to looking into templatizing the Centos installer to take custom partition sections.


stanchan740
2018-03-05 17:02
@zehicle There are multiple ways to do handle inventory with awx. Ansible 2.5, which should hit rc soon, has a few changes to how inventory can be handled in awx as well. Going to have to experiment with a few implementations. The drp dynamic inventory script works fine as an inventory provider in awx, but pushing the inventory updates to awx might be a better option.

zehicle
2018-03-05 17:10
some progress to show off... Immutable Image Deploys! https://youtu.be/tDcEzirTLbo

romain.lafontaine
2018-03-05 17:55
@zehicle You made my day

stanchan740
2018-03-05 18:53
when an iso is uploaded and it shows up in the isos list, it means it?s completely uploaded? is it a blocking state?

romain.lafontaine
2018-03-05 20:06
@zehicle May I ask more info about the packer-to-drp flow ? Like which output format is used on packer side, how it's digested on DRP side ?

zehicle
2018-03-05 20:07
yes! 1) Raw Format 2) you have choices, right now, it's in the ISOs area

greg
2018-03-05 20:11
@stanchan740 - if you use the drpcli command, it returns upon completion.

stanchan740
2018-03-05 20:11
just noticed that it?s blocking call :slightly_smiling_face: thanks!

greg
2018-03-05 20:12
With regard to polling from a different thread, then it will show up as `.<filename>.part` in the iso directory while it is uploading and upon completion it gets converted to `filename`

stanchan740
2018-03-05 20:15
for some reason, it downloads it everytime? but it should skip if it already uploaded?

stanchan740
2018-03-05 20:15
``` for _, iso := range isos { if iso == env.OS.IsoFile { return nil } } ```

greg
2018-03-05 20:16
no. It assumes that you are replacing updating. We could probably make it smarter, but ?

greg
2018-03-05 20:16
Let me check to be sure though

stanchan740
2018-03-05 20:17
yah? I might just add a check for it in the isos directory

greg
2018-03-05 20:17
Actually, that would be a pretty good enhancement.

greg
2018-03-05 20:17
If the bootenv is already available, we don?t need to upload the iso for that bootenv.

greg
2018-03-05 20:17
it would then skip it.

greg
2018-03-05 20:18
yeah - I think that makes sense and would make things simpler for the installer.

stanchan740
2018-03-05 20:20
the blobstore is currently just a wrapper around the filesystem? replaceable with S3 or minio in the future I assume?

greg
2018-03-05 20:24
Yes to the filesystem piece. In general, our philosophy is more to let you reference your files where you want.

greg
2018-03-05 20:25
You can build a bootenv that references or use the repos params to define the external location of the iso repos. The same with kernels and initrds. we owe some more docs and examples on this, but the point is that through the repos params, you can reference separate off DRP locations for just about everything.

greg
2018-03-05 20:26
You could use NFS today if you wanted to maintain the DRP managed location aspect.

stanchan740
2018-03-05 20:27
cool? that makes since

clint
2018-03-05 21:58
Really exciting stuff!

rakeshrhcss
2018-03-06 05:00
thanks @greg

rackn.slack
2018-03-06 14:26
has joined #json

rackn.slack
2018-03-06 14:32
Hi all, I'm a complete noob with this lot! I'm wondering if anyone can point me in the direction of setting up Rebar to PXE boot Rock64s. I am trying to boot ayufan linux builds... Thanks in advance!

spector
2018-03-06 14:37
hello @rackn.slack $welcome

2018-03-06 14:37
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

rackn.slack
2018-03-06 14:40
I have a Rebar server set up and running. I am building Self Hosted Kubernetes clusters with it. It is awesome!!!! I am going to provide the modified krib template when I have it thoroughly tested. I have a 20xRock64 cluster that I'm now looking at installing it onto!

spector
2018-03-06 14:45
Sounds great - the Engineering folks are driving in to work right now and will be online shortly to provide a response

shane
2018-03-06 14:45
@rackn.slack - welcome - is this what you're referring to? http://wiki.pine64.org/index.php/ROCK64_Main_Page

rackn.slack
2018-03-06 14:46
...yes that's the fellow! I have 20 of them in a box! 4Gb memory makes it all possible!

shane
2018-03-06 14:49
Are you planning to run Digital Rebar Provision (DRP) on one of these, too - or are you just planning to install to them ?

rackn.slack
2018-03-06 14:49
...I thought of that but you need 6Gb minimum!?

shane
2018-03-06 14:49
nah - I run DRP on 768 MB VMs regularly ...

shane
2018-03-06 14:51
it's possible the old Digital Rebar ver2 had a 6GB memory footprint requirement, but the current DRP ver3 does NOT

rackn.slack
2018-03-06 14:51
...I think I tried it tentatively but stopped when I got a message saying min 6Gb. I *may* have a project that could use that though...

rackn.slack
2018-03-06 14:52
my first issue is installing ayufans linux on them over PXE. Once I get that cracked I'm in business!

shane
2018-03-06 14:53
we do build DRP (ver3) for ARM64 architecture, but it does not receive heavy testing - in general - we're a single Golang binary (only 30 MB in size), with very very few external dependencies (currently only 7zip, bsdtar, and unzip)

shane
2018-03-06 14:54
we do not have any patterns for you on PXE booting ARM64 architecture, we are certainly available to help here on the #community channel as questions or issues arise ... but right now, we haven't been PXE booting ARM64 hardware

rackn.slack
2018-03-06 14:55
Ok, thanks. I'll keep on looking.

shane
2018-03-06 14:56
we def. should be able to PXE boot ARM64, just not a lot of patterns for you to follow

florent.wagener
2018-03-06 14:58
hi there! Do you guys can explain to me why after some time on sledgehammer-wait stage I am loosing the connectivity to my servers ? They seems to lose their IP even though the DHCP lease is still valid. I tried to renew it using the dhclient ethx command, they get their IP back but still impossible to ping... The only solution I have found so far is to reboot the server...

rackn.slack
2018-03-06 14:59
...I have not been able to find any! I've got as far as the DHCP server returning 'a' filename for the Vendor Class, I just don't know what it should be, or where to get it!!!!!

shane
2018-03-06 15:00
post your details here - we have some guys that are really good with hardware - may be able to sort out what needs to be served for ARM64

shane
2018-03-06 15:02
@florent.wagener - are the machines in Sledgehammer when they lose their lease, or in a provisioned OS instance ?

florent.wagener
2018-03-06 15:02
@shane they are on sledgehammer

rackn.slack
2018-03-06 15:03
What I have so far is the rom loaded with ayufan's u-boot loader. This, in theory, allows the Rock64 to PXE boot. I have a MS$ DHCP server configured to send 'a' boot file name when the Vendor Class = 'U-Boot.armv8'.... but that's as far as I have got.

shane
2018-03-06 15:04
there are a couple of things that may be at play: * increase the Preferences setting for DHCP Lease time * `dhclient` may not be running persistently inside the sledgehammer image (I'll hav to check this) * we use Tokens to authenticate the Machine to DRP - those Tokens have a timeout as well (but theoretically, that shouldn't effect the DHCP lease)

florent.wagener
2018-03-06 15:05
I'll try to extend the active lease time/reservation lease time and see if there is a difference :slightly_smiling_face:

shane
2018-03-06 15:06
looking at my sledgehammer, it's not running `dhclient` so it's probably a single `dhclient` lease request, which would explain the issue

shane
2018-03-06 15:06
nothing is trying to renew the lease

shane
2018-03-06 15:08
extending the Lease on the DRP side will work as long as you make it really long - but this isn't necessarily the best solution - looking in to it

florent.wagener
2018-03-06 15:08
thanks !

florent.wagener
2018-03-06 15:10
on a recently rebooted machines (less than 30min) I can see the dhclient is running: ```<sledgehammer> [root@E16968917902274 ~]# ps aux | grep dhclient root 3238 0.0 0.0 113372 13140 ? Ss 15:04 0:00 dhclient eth0```

florent.wagener
2018-03-06 15:12
on a non recently rebboted machine the dhclient isn't running: ```<sledgehammer> [root@localhost ~]# ps aux | grep dhclient root 6865 0.0 0.0 112660 972 tty1 R+ 16:22 0:00 grep --color=auto dhclient```

greg
2018-03-06 15:12
Is drpcli still running?

greg
2018-03-06 15:13
Dhcliebt may exit when drpcli exits out of the sledgehammer service

florent.wagener
2018-03-06 15:13
no it isn't

florent.wagener
2018-03-06 15:14
on the other machine, it is: ```<sledgehammer> [root@E16968917902274 ~]# ps aux | grep drpcli root 3342 0.1 0.0 731576 17120 ? Sl 15:04 0:00 drpcli machines processjobs f8466cb9-acba-43a5-b0e8-1bcc5954b86e ```

florent.wagener
2018-03-06 15:14
that's weird because I have no job running on any machine at the time though...

florent.wagener
2018-03-06 15:15
`f8466cb9-acba-43a5-b0e8-1bcc5954b86e` is the UUID of the last job that was executed 10min ago

shane
2018-03-06 15:17
`drpcli` may continue to run, depending on your stage actions when the last stage runs

florent.wagener
2018-03-06 15:18
interesting when I try to run a drpcli command on the machine without IP, I got this error: ```Error creating sessions: CLIENT_ERROR: Get https://127.0.0.1:8092/api/v3/users/rocketskates/token: dial tcp 127.0.0.1:8092: getsockopt: connection refused```

florent.wagener
2018-03-06 15:18
the IP is wrong here...

shane
2018-03-06 15:18
if the machine has no IP - then it can't reach back out to the DRP endpoint

shane
2018-03-06 15:18
we provide the DRP Endpoint IP addr to the `drpcli` dynamically (if `--static-ip` is not set on the DRP side)

florent.wagener
2018-03-06 15:19
Got it.

florent.wagener
2018-03-06 15:19
makes sense :slightly_smiling_face:

shane
2018-03-06 15:19
So - the lesson is ... Sledgehammer brings up `dhclient` - and if the Runner (`drpcli`) isn't running, then the DHCP Lease times out ... there is also the Token expiry that the Runner (`drpcli`) is operating with, which can also time out

shane
2018-03-06 15:19
these things need to be managed for long living Sledgehammer runs

shane
2018-03-06 15:20
most of the use case patterns with Sledehammer are quick use - and move in to installed OS ... but in the case of "ready state" infrastructure - we need to harden Sledgehammer to be longer lived with it's management of Runner token and (subsequently) DHCP client lease renewal

florent.wagener
2018-03-06 15:20
Yep :slightly_smiling_face:

shane
2018-03-06 15:21
some of the can be effected with the Preferences settings by simply extending the times to really long lease lengths

vlowther
2018-03-06 15:21
@rackn.slack What type of firmware and network booting situation does those rock64 boards have?

florent.wagener
2018-03-06 15:21
Anyway this is not a big issue for our testing purposes, but it could be in production as we might need to have a "ready state" for our servers.

florent.wagener
2018-03-06 15:22
Im gonna try to extend the lease to see if that changes anything

vlowther
2018-03-06 15:22
@florent.wagener I think it is some legacy code left over from some DHCP shennanigans we had to pull for DRv2.

vlowther
2018-03-06 15:23
If so, unwinding that behaviour should be a simple thing to do.

florent.wagener
2018-03-06 15:25
@vlowther cool! Waiting for the fix then :slightly_smiling_face:

rackn.slack
2018-03-06 15:35
@vlowther ...they have nothing installed by default. You flash them with https://github.com/ayufan-rock64/linux-build/releases/download/0.6.25/u-boot-flash-spi-rock64.img.xz to get network booting going. This all seems to work nicely.... i.e. I see the required BOOTP messages using WireShark.

greg
2018-03-06 15:42
@rackn.slack - if you are using DRP as the dhcp server, it would be really useful for use to see a debug trace of the DHCP packet stream. This can be done by turning the DHCP Debug up to debug and catching the output of DRP.

wdennis
2018-03-06 16:01
FYI - seeing this on my Ubuntu 16.04 installs - just hangs here on this screen for a while near the end of the install...


shane
2018-03-06 16:03
Ctl-Alt-F4 to see console messages

shane
2018-03-06 16:03
it's hanging in the preseed somewhere - should be a clue on screen 4

rackn.slack
2018-03-06 16:04
@greg...I'm not using the DRP as the DHCP server in this case. I have to use MS$. I am using WireShark however. What filter would you like on that? How do I get the reslting file to you?

greg
2018-03-06 16:04
just dhcp messages. @rackn.slack

greg
2018-03-06 16:04
There is a bigger issue. We don?t have an arm-based sledgehammer yet.

rackn.slack
2018-03-06 16:05
...that will slow things down!

wdennis
2018-03-06 16:05
@shane Interesting- Ctrl/Alt/F4 not working to bring up terminal...

greg
2018-03-06 16:05
sometimes try ALT-F4

rackn.slack
2018-03-06 16:06
@greg is that something that is currently being worked on in V3?

wdennis
2018-03-06 16:06
Ok @greg that worked


wdennis
2018-03-06 16:09
(Sorry for the screen reflection...) looks like it?s hauling down net-post-install.sh & running it

greg
2018-03-06 16:11
It has been a lower priority. We don?t have one today. @rackn.slack We don?t have hardware to play with it and we haven?t had customer interest to drive it.

vlowther
2018-03-06 16:12
@rackn.slack It is sorta driven by customer demand and/or comunity involvement.

vlowther
2018-03-06 16:12
and ww mostly want to target ARM64 stuff that is running a UEFI firmware

vlowther
2018-03-06 16:13
rather than other, less well documented network boot protocols.

vlowther
2018-03-06 17:35
We are working on a new job runner that may affect existing workloads, so I am looking for feedback on the design and implementation before we replace the current implementation with it.

vlowther
2018-03-06 17:37
https://github.com/digitalrebar/provision/blob/add-auto-reconnection-and-make-an-FSM-machine-agent/api/agent.go <-- the code for the new job runner. @wdennis and other interested parties should take a look at the comments in that code and give me feedback as to how it will affect your current workflows.

wdennis
2018-03-06 17:41
@vlowther What are motivations for change? (simplification?)

vlowther
2018-03-06 17:41
Easier maintenance and simplification.

greg
2018-03-06 17:42
To not have to remember the dang STOP all the time.

greg
2018-03-06 17:42
:slightly_smiling_face:

vlowther
2018-03-06 17:43
With this runner will do what I consider to be the Right Thing by waiting by default when there is nothing to do, rebooting whenever the bootenvs change (unless it shoudl exit instead to let an OS install finish)

wdennis
2018-03-06 17:44
I do not believe it will affect anything I?m currently doing; I only have a simple workflow (not default, which in my implementation is empty, but tied to a custom profile) and the KRIB one

wdennis
2018-03-06 17:47
This is my custom profile workflow: ``` "change-stage/map": { "discover": "prep-install:Success", "prep-install": "ubuntu-16.04-install:Reboot", "ubuntu-16.04-install": "complete-nowait:Success" } ```

wdennis
2018-03-06 17:49
And my KRIB one: ``` "change-stage/map": { "docker-install": "krib-install:Success", "finish-install": "docker-install:Success", "krib-install": "complete:Success", "runner-service": "finish-install:Stop", "ssh-access": "runner-service:Success", "ubuntu-16.04-install": "ssh-access:Success" } ```

greg
2018-03-06 17:52
Thank you, @wdennis - you answered a question for me.

vlowther
2018-03-06 19:04
@florent.wagener https://github.com/digitalrebar/provision-content/pull/68 <-- should be in tip soonish.

florent.wagener
2018-03-06 19:04
@vlowther w00t !

wdennis
2018-03-06 20:58
So @greg is the ?curtin? thing something you guys invented?

greg
2018-03-06 21:06
No. It is a tool in ubuntu/maas. MaaS wraps a lot of crap around it.

greg
2018-03-06 21:06
We drive it for now. It was quick and expedient. I want to eventually replace it.

greg
2018-03-06 21:07
I?ve been looking at ignition from CoreOs ne Redhat, but it has issues too.

greg
2018-03-06 21:07
I?ve tinkered with changing ignition and have some stuff lying around to switch to it, but it has more work to do.

greg
2018-03-06 21:08
I like ignition better because it is go-based and easier to work with but it doesn?t have the pieces to handle setting up bootloaders like curtin does.

greg
2018-03-06 21:08
Anyway, it is the way it is for right now.

greg
2018-03-06 21:08
@wdennis

wdennis
2018-03-06 21:09
The idea is very cool, congrats on shipping something :+1::skin-tone-2:

wdennis
2018-03-06 21:12
In other news, all my Ubuntu installs (which are the only kind I do ;) are hanging on the net-post-install.sh phase...


greg
2018-03-06 21:12
yeah _ I saw. I tried it and it passed for me.

wdennis
2018-03-06 21:13
This wasn?t happening to me pre-3.7

greg
2018-03-06 21:13
hmm

greg
2018-03-06 21:13
can you `alt-f2`

greg
2018-03-06 21:13
ps -ef | grep drpcli

wdennis
2018-03-06 21:13
?Works on my machine? ;)

greg
2018-03-06 21:14
Trying to think about changes that could change your path.

greg
2018-03-06 21:14
From this work flow: ```"change-stage/map": { "discover": "prep-install:Success", "prep-install": "ubuntu-16.04-install:Reboot", "ubuntu-16.04-install": "complete-nowait:Success" }```


greg
2018-03-06 21:15
I prefer to do `complete-nowait:Stop`.

greg
2018-03-06 21:15
We thought what you have show have worked, but we might have been sneaky and changed something.

wdennis
2018-03-06 21:16
I?ll change it as you prefer, and see what I get

greg
2018-03-06 21:20
@wdennis - what version are you running?

greg
2018-03-06 21:27
Did your machine make it to complete-nowait while it was hung? in the UX.

greg
2018-03-06 21:29
I think I see the change that might have changed the behavior. STOP should fix it.

greg
2018-03-06 21:30
@vlowther?s new stuff will also fix the problem.

greg
2018-03-06 21:31
The new stuff will fix it in two ways.

wdennis
2018-03-06 21:40
@greg Yes, it shows as `complete-nowait` in the UX

wdennis
2018-03-06 21:41
I just reboot the nodes, and they're fine

wdennis
2018-03-06 21:42
Also, running v3.7.1 right now - I'll upgrade soon(tm)

killsudo
2018-03-07 04:28
has joined #json

killsudo
2018-03-07 04:31
When does the /var/lib/dr-provision/tftpboot/machines/$UUID folder get created and the kickstart file stuck in it?

shane
2018-03-07 04:31
@killsudo $welcome

2018-03-07 04:31
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

shane
2018-03-07 04:32
Kickstarts and all templated files are never rendered to disk - they're rendered on the fly by DRP

shane
2018-03-07 04:32
and served from the in-memory virtual filesystem

killsudo
2018-03-07 04:32
I've got my DR-Provision setup and working with some test vm's. I can get an ip via dhcp from rebar and see it load ipxe then sledgehammer

killsudo
2018-03-07 04:33
but after the machine is registred and I change it's bootenv to centos7 and set the ks template to centos-7.ks.tmpl and reboot the test vm centos7 never installs


shane
2018-03-07 04:34
are you familiar with the `drpcli` command ?

killsudo
2018-03-07 04:34
interesting.. this very much like VMwares AutoDeploy for vCenter for automating ESXi

killsudo
2018-03-07 04:35
yea i've been playing drpcli and getting comfortable with it

shane
2018-03-07 04:35
(except MUCH MUCH better ....) :slightly_smiling_face:

killsudo
2018-03-07 04:35
My end goal is to wrap up the drp API for StackStorm

shane
2018-03-07 04:35
can you please provide `drpcli prefs list` output

killsudo
2018-03-07 04:36

shane
2018-03-07 04:36
if you have the defaultStage setting set - then you need to manipulate what machines are going to be installed with via the use of Stages (workflow), and not BootEnvs

killsudo
2018-03-07 04:37
hmm yea I clicked that wizard button and was just trying to find better doc's on how to use that workflow tool

shane
2018-03-07 04:37
since you have defaultStage set, you need to set a Stage on the machine, which contains a BootEnv definition - do not manipulate the Machine via BootEnv settings

shane
2018-03-07 04:37
we don't have the workflow fully documented yet

killsudo
2018-03-07 04:38
ok so Im not crazy, my research ended up at http://provision.readthedocs.io/en/stable/doc/workflows.html

shane
2018-03-07 04:38
for Docs - please switch to `tip` version - they are MUCH MUCH more updated

killsudo
2018-03-07 04:38
which while nice, doesn't help that much if you are already very familiar with dhcp/pxe/autodeploy etc

shane
2018-03-07 04:39
(use the lower right floating version selector)

killsudo
2018-03-07 04:40
so I guess workflows are optional then?

killsudo
2018-03-07 04:40
my use case might be slightly different then a normal user

shane
2018-03-07 04:41
"technically" workflow (stages) can be optional, but we have found everyone adopts them once they understand them

greg
2018-03-07 04:42
`defaultBootEnv` should be sledgehammer to be consistent.

killsudo
2018-03-07 04:42
I don't want/need a TheForeman/Cobbler etc experience with lifecycle management. When I start a provisioning run I have already acquired all of my varibles from different inventories etc so I mostly need a nice API driven way to define a machine and when it boots just pull down the os with a firstboot script etc. Once I get into the OS to pre-stage some stuff I have to back out of the OS cleanly and hand off the machine to a third party

killsudo
2018-03-07 04:43
so no leaving keys etc and zero need to be able to come back to it in the future

killsudo
2018-03-07 04:43
fire and forget style setups

killsudo
2018-03-07 04:44
ideally if I can tie a 'machine' to a dhcp option 82 string, then assign a bootenv to get the SO on disk then the better

killsudo
2018-03-07 04:44
then I don't even need to futz with subnets or mac's

shane
2018-03-07 04:46
I think you'll still find that Stages and workflow is useful for you. Particularly if you're "first boot script" needs to vary between machines.

zehicle
2018-03-07 04:46
considering your request, the stages/runner would do exactly what you are asking for post provision. No keys, no ssh, no return access

shane
2018-03-07 04:46
You can build the workflow to do common boot things, select the OS, then do the post-provisioning twiddly bits. Our install method is good because our agent disolves by default after provisioning - and you don't need to leave artifacts behind

killsudo
2018-03-07 04:47
I already have my OS's 100% handled in Ansible beautifully including backing myself out gracefully. So mostly just need winrm with https on windows and ssh on linux/esxi so stackstorm can launch my ansible/powershell plays

killsudo
2018-03-07 04:47
so this is sounding good

killsudo
2018-03-07 04:50
so Default BootEnv in global properties is now sledgehammer, thanks @greg

killsudo
2018-03-07 04:50
ok so cleared out my workflows

killsudo
2018-03-07 04:50
so should I still use the wizard or start by building a simple single line flow

zehicle
2018-03-07 04:56
the wizard builds a basic one

greg
2018-03-07 04:56
@killsudo - let?s be clear on what you are trying to do.

greg
2018-03-07 04:56
You already have inventoried, IPMI managed, Raid configured, Bios adjusted system that you control from another system or systems.

killsudo
2018-03-07 04:56
+1 - http://provision.readthedocs.io/en/tip/doc/arch/provision.html tip has much better write up and starting to make some sense

killsudo
2018-03-07 04:57
@greg yup lets assume that

greg
2018-03-07 04:58
You want the to PXE boot these machines (triggered externally) and the result of some intervening process is a booted OS (of some type) with an ssh key (on winrm enablement).

greg
2018-03-07 04:58
I assume you also probably want some notification that this is done.

killsudo
2018-03-07 04:58
notification would be a nice extra

killsudo
2018-03-07 04:59
I can listen for or poll events (stackstorm uses OpenStack Mistral under the hood for workflows)

greg
2018-03-07 05:00
How do you expect to inform the black box of the OS selection and ssh key to use?

greg
2018-03-07 05:00
I assume the ssh key will be consistent and constant to your secondary provisioner.

killsudo
2018-03-07 05:01
depends on the logic once I start manipulating it

killsudo
2018-03-07 05:01
but yea the key could be static to start with and just a simple edit to the kickstart

killsudo
2018-03-07 05:02
not hard for me to have the OS call my stackstorm hook to notify it that it's alive and send along some info

killsudo
2018-03-07 05:02
I plan to put the mistral workflow into pause state while I wait around for the boot/dhcp/pxe/os load part

killsudo
2018-03-07 05:03
then I can resume when I see the incoming call

greg
2018-03-07 05:03
okay - now we are getting some where. A `task` that calls home when ready.

greg
2018-03-07 05:03
How do you want to install the OS?

killsudo
2018-03-07 05:03
you mean from where or do I want to use disk images?

greg
2018-03-07 05:04
well - kickstarts/preseeds, raw images, rootfs images?

greg
2018-03-07 05:04
immutable in-memory CoreOS/Rancher?

killsudo
2018-03-07 05:05
probably more then one, I have access to the mac addresses and know the OS before talking to drp

greg
2018-03-07 05:05
ips?

killsudo
2018-03-07 05:05
yup I can also know that

killsudo
2018-03-07 05:05
or I can just use my option-82 string that my dhcp relays are injecting

killsudo
2018-03-07 05:05
and map that to an OS

greg
2018-03-07 05:06
yep - both are options. Requiring different paths.

killsudo
2018-03-07 05:06
yea that's where I am stuck is trying to understand all of my options

greg
2018-03-07 05:07
Are you using DRP as a DHCP server?

killsudo
2018-03-07 05:08
I do have a working dhcp/pxe system now, it's just old school centos/isc-dhcp/pxe with some bash and ansible automating it

killsudo
2018-03-07 05:08
I can if that's best

killsudo
2018-03-07 05:08
but no issue using relays and next-boot / next-server options in regular dhcp

greg
2018-03-07 05:08
Do the machines have sane IPXE clients?

killsudo
2018-03-07 05:09
does that exist?

greg
2018-03-07 05:09
:slightly_smiling_face: fair enough.

greg
2018-03-07 05:09
http bzimage support

killsudo
2018-03-07 05:09
yup, all of my current systems chainload ipxe

killsudo
2018-03-07 05:10
or they can if that's what your getting at

killsudo
2018-03-07 05:10
it's all enterprise kit

greg
2018-03-07 05:10
yeah.

killsudo
2018-03-07 05:10
more concerned about uefi stuff (argh)

greg
2018-03-07 05:10
well there is uefi ipxe bootloader

killsudo
2018-03-07 05:10
if it works

killsudo
2018-03-07 05:11
I know Hyper-v Gen2 vm's will be a pain with uefi being enforced

greg
2018-03-07 05:11
VM?s uefi suck

killsudo
2018-03-07 05:11
I think my first test run with drp, those failed to even reach ipxe

killsudo
2018-03-07 05:11
my other test machines seemed happy with the defaults

greg
2018-03-07 05:11
They all seem to assume a disk

greg
2018-03-07 05:13
To reduce your initial learning space, you may want to use your existing DHCP services to chain load into sledgehammer/discovery.

killsudo
2018-03-07 05:14
well I have my test lab already successfully working with DRP dhcp service

greg
2018-03-07 05:15
Okay - well - I was more thinking for your IP management system that you already have, but either way.

greg
2018-03-07 05:15
I?d let the machines get discovered. Default to the `discover` stage.

greg
2018-03-07 05:15
Transition that to a new stage with a new task that acts as an in-line classifier.

killsudo
2018-03-07 05:18
what is an 'in-line classifier'? The template system?

greg
2018-03-07 05:19
parse your option 82 and convert to an install stage. (centos-7-install or whatever). Transition that install stage to a stage that installs your post reboot hook and reboot the machine.

greg
2018-03-07 05:20
`in-line classifier` == stage with a task represented by a template that would parse the option 82 out of the dhcp lease file and convert it to an install stage.

killsudo
2018-03-07 05:20
lets start simpler so I can build up to that. I need to get my head wrapped around the basic 1.2.3 operations. otherwise my questions are not gonna be as helpful or concise

killsudo
2018-03-07 05:21
dhcp > pxe > discovery > centos7

killsudo
2018-03-07 05:22
whats the workflow way to just discover every machine that network boots against drp and install centos7 automatically with the default ks

greg
2018-03-07 05:22
set the default stage to centos-7-install

killsudo
2018-03-07 05:23
what about if I want to hit 'discover' first then centos7?

greg
2018-03-07 05:23
that is where you will need a workflow

killsudo
2018-03-07 05:23
ok so on that page and loaded up the default global workflow with the wizard

greg
2018-03-07 05:23
Read about workflows and `ssh-access`.

greg
2018-03-07 05:25
You aren?t using virtualbox

greg
2018-03-07 05:26
Will want this:

greg
2018-03-07 05:26
discover->centos-7-install:Reboot

greg
2018-03-07 05:27
centos-7-install->finish-install:Stop

killsudo
2018-03-07 05:29
ok so seeing that

killsudo
2018-03-07 05:29
would 'discover->sledgehammer:reboot|centos7-install:reboot->finish-install:Stop' also be acceptable?

greg
2018-03-07 05:30
we don?t have a sledgehammer stage.

killsudo
2018-03-07 05:30
*sledgehammer-wait*

greg
2018-03-07 05:31
sledeghammer-wait is just a holding stage. It is meant as near-line waiting system.

killsudo
2018-03-07 05:32
ok, so that's more of a live environment I could break into and do actions on the box before proceeding with the OS install

killsudo
2018-03-07 05:32
like updating firmware or something if I was to write that task up

greg
2018-03-07 05:32
or heaven forbid automate them. :slightly_smiling_face:

shane
2018-03-07 05:32
as Stages !

greg
2018-03-07 05:33
RackN has stages that do that and more.

greg
2018-03-07 05:34
Sledgehammer is pretty rich so you could use stackstorm to trigger things there if you wanted.

killsudo
2018-03-07 05:34
yea your previous statement cleared it up, I was confused as to what the actual difference between discovery and sledgehammer was

greg
2018-03-07 05:35
Discover will put ssh keys in place if you define enough parameters.

killsudo
2018-03-07 05:35
I was thinking sledgehammer was like the autodeploy boot env that loads and send the ipxe details back to vcenter and must be booted the first time thru on an unknown machine

greg
2018-03-07 05:36
it can be used that way, but not required.

greg
2018-03-07 05:36
You could direct create machines and just jump to centos install

greg
2018-03-07 05:36
That was another path

killsudo
2018-03-07 05:37
well if guys like the workflows I'll give them a chance

greg
2018-03-07 05:39
most people want a lifecycle around their machines. Workflows really help with that.

killsudo
2018-03-07 05:44
hmm still no go, still failed to fetch ks

killsudo
2018-03-07 05:45
and $url:8091/machines/8bc9b109-0034-42d6-847b-91d716d65333/seed just 404's

shane
2018-03-07 05:45
Kickstart is rendered as `compute.ks` (not seed)

shane
2018-03-07 05:46
for centos-7-install ... for an ubuntu-16.04-install you'll get `seed`

killsudo
2018-03-07 05:46
there we go

killsudo
2018-03-07 05:46
so it did generated the ks, and this time my vm went thru discovery then auto-rebooted and pxe booted into centos7

killsudo
2018-03-07 05:46
so progress

killsudo
2018-03-07 05:47
the machine details also show it inheriting the centos7 bootenv from the workflow

killsudo
2018-03-07 05:48
weird.. 'curl: (23) Failed writing body (8520 != 16384)'

greg
2018-03-07 05:48
Depending on file and timing, it can change.

shane
2018-03-07 05:49
the kickstart/preseed is only available to be rendered during a provisioning activity

shane
2018-03-07 05:49
I actually create a "phantom" machine I can place in to various BootEnvs to render templates against to test template rendering

killsudo
2018-03-07 05:51
that curl was from dracut inside the testvm when it tried to locate ks

killsudo
2018-03-07 05:51
apparently 1gb of ram isn't enough with the fs

shane
2018-03-07 05:51
nope

greg
2018-03-07 05:51
Correct

shane
2018-03-07 05:51
not for CentOS 7 - you need 1.5 GB

shane
2018-03-07 05:51
(that's what I use on my test VMs)

greg
2018-03-07 05:51
sledgehammer is rich

shane
2018-03-07 05:51
Ubuntu is ok w/ 1 GB

killsudo
2018-03-07 05:51
yup you guys are on it, set 4gb and now centos7 is running KS

greg
2018-03-07 05:51
I usually use 2GB. It is designed for real servers.

killsudo
2018-03-07 05:52
yea not an issue, I can change those values on vm's before and after the install

killsudo
2018-03-07 05:54
I assume windows and esxi are supported well enough if your willing to put in the grunt work?

shane
2018-03-07 05:54
both are RackN commercial components ...

killsudo
2018-03-07 05:54
looked like the explode script knew how to handle those isos

killsudo
2018-03-07 05:54
what does that mean? it'll never work without some addon?

shane
2018-03-07 05:54
we support Windows via Image based deployment (eg Immutable Infrastructure)

shane
2018-03-07 05:55
there are no community based Open Source pieces to do ESXi or Windows

killsudo
2018-03-07 05:56
but I can someone implement the details themselves if they want to?

killsudo
2018-03-07 05:56
if all you want is to get pxe to launch the winpe stuff and image the disk?

shane
2018-03-07 05:57
it should be possible with the right effort

killsudo
2018-03-07 05:57
I'll look into the RackN stuff, I assume questions regarding esxi/windows isn't really for this #community channel then?

killsudo
2018-03-07 05:58
is it only windows that has a commercial piece?

shane
2018-03-07 06:01
Both Windows and ESXi - both have required a lot of engineering work to get working repeatable and at scale.

killsudo
2018-03-07 06:03
yea I bet, both gave me a headache on my current pxe system

killsudo
2018-03-07 06:08
heyo, first box is online. not to bad now that Im grasping the steps

killsudo
2018-03-07 07:30
odd one with this Hyper-V Gen2 vm.. I can see that DRP serves it up the proper efi boot file via option 67 and then ipxe loads but after ipxe tries to fetch '' it bombs out

killsudo
2018-03-07 07:31
dr-provision2018/03/07 07:24:38.728050 [3271:665]static [error]: /home/travis/gopath/src/github.com/digitalrebar/provision/midlayer/tftp.go:82

killsudo
2018-03-07 07:31
dr-provision[3877]: [3271:665]TFTP: ipxe.efi: transfer error: sending block 0: code=8, error: User aborted the transfer

greg
2018-03-07 12:50
@killsudo - check in the tftpboot directory for that file. It should be there.

vlowther
2018-03-07 14:59
@killsudo What was the error on the ipxe side?

vlowther
2018-03-07 15:00
Our pxe serving logic for UEFI has been built around "whatever recent-ish tianocore expects", and I have no idea if hyperv uses a tianocore based UEFI stack or if it rolls its own.

vlowther
2018-03-07 15:12
and that "User aborted the transfer" error you are seeing is usually just the firmware or ipxe getting the size if the thing it wants to load next.

shane
2018-03-07 16:16
- quick poll ... would you like to see Immutable Provisioning demo on the next meetup? (Immutable = image based deployments). Please vote on the meetup page: https://www.meetup.com/digitalrebar/polls/1263248/

florent.wagener
2018-03-07 18:47
following up on the DHCP issue I mentioned yesterday (even though I know that you have fixed it in the last tip version). Extending the lease seems to do the trick !

shane
2018-03-07 18:48
Yep - it's a band-aid though ... and not necessarily the right solution for some environments that want short renewal cycles on their leases ... but, glad that works for the moment :slightly_smiling_face:

shane
2018-03-07 18:48
we'll have this fix in a release out soon ... or you can update to `tip` to get it

florent.wagener
2018-03-07 19:37
Im working on something else right now but I will definetely test it :slightly_smiling_face:

killsudo
2018-03-07 22:18
@vlowther I think it might be related to this - http://git.ipxe.org/ipxe.git/commitdiff/9366578

vlowther
2018-03-07 22:21
Possible. We pull in the latest ipxe.efi whenever we cut a build, so given the patch date that fix should be in there.

vlowther
2018-03-07 22:21
Doesn't mean that patch hasn't been reverted or broken by something else in the meantime, though. :confused:

killsudo
2018-03-07 22:23
yea looks like hyper-v is gonna have to remain a virt disk copy as that comment doesn't sound reassuring until I can revisit it

killsudo
2018-03-07 22:25
I usually try and start stuff with hyper-v as an edge case cause if it works in hyper-v I can pretty much make it work on anything else

shane
2018-03-07 22:25
That's kind of like saying "I hit my self in the face with a shovel before digging a hole, because I like the feeling it gives me ... "

vlowther
2018-03-07 22:26
yeah, I go with something a little more mainstream, like qemu+kvm

vlowther
2018-03-07 22:26
:slightly_smiling_face:

killsudo
2018-03-07 22:31
heh it's what you live with when hyper-v is involved. Now I get to finally move on to testing proxmox and vmware. Both of their networking layers are virtualizied. proxmox is openvswitch with vlans dumping into a evpn/vxlan fabric. vsphere is the same with some nsx involved and dumping into a evpn/vxlan fabric. All subnet gateways are anycasted gateways(using evpn) on juniper mx routers

killsudo
2018-03-07 22:32
This topology works awesome right now with the pxe booting of physical servers but the virt side throws a few extra hops in the way that *should work*

greg
2018-03-07 22:35
@killsudo - if you already have DHCP relays working in those environments, then it should work pretty cleanly.

killsudo
2018-03-07 22:35
yea it's more making sure ovs or nsx doesn't eat the broadcast

killsudo
2018-03-08 07:28

killsudo
2018-03-08 07:28
Looks like this *should* be doable on Gen2 hyper-v

greg
2018-03-08 14:14
@killsudo - if you want to try different `ipxe.efi`, you can just place them in the tftpboot directory. If you want them to persist over drp restarts, you need to place them in the `replace` directory as well.

greg
2018-03-08 14:15
If you are in `isolated` mode, these are in the directory you started drp from.

greg
2018-03-08 14:15
If you are in `production` mode, these are in `/var/lib/dr-provision`.

greg
2018-03-08 14:15
If you find one that works, lets know where you got it from. :slightly_smiling_face:

dave.parker
2018-03-08 22:07
Hey all. I'm trying to create a bootable iPXE image that I can use to manually point a machine at a dr-provision host on another subnet. I'm doing this because I can't run a DHCP server on the subnet the host to be discovered lives on.

dave.parker
2018-03-08 22:07
Oh and for extra complications, it's also a UEFI boot only system.

dave.parker
2018-03-08 22:08
I have the image booting and am trying to chain boot bootx64.efi. But I get the error ```no config file found in forcing interactive mode due to config file error(s) ELILO boot: .............```

dave.parker
2018-03-08 22:08
Then just an endless string of dots.

dave.parker
2018-03-08 22:08
It never seems to get any farther.

dave.parker
2018-03-08 22:09
So it downloads bootx64.efi from the dr-prov server, but can't go any farther apparently.

greg
2018-03-08 22:14
@dave.parker - There is ProxyDHCP for this case if needed. It can in parallel only had out boot options.

dave.parker
2018-03-08 22:14

greg
2018-03-08 22:15
@dave.parker - Second, I think we?ve dropped support fro bootx64.efi because it was buggy. And switched to `ipxe.efi` because it worked better. @vlowther will have better info.

dave.parker
2018-03-08 22:15
Ok.

dave.parker
2018-03-08 22:16
I can't install anything in this subnet really. So I don't know that I can do the ProxyDHCP either.

greg
2018-03-08 22:16
okay - so - how do you control what it is going to boot the first time?

dave.parker
2018-03-08 22:17
With a bootable iso, with an iPXE executable that runs that script I posted.

greg
2018-03-08 22:18
okay - so you could try using ipxe.efi with a pxe script that points to `http://<ip>:8091/default.ipxe`

greg
2018-03-08 22:18
where `<ip>` is your DRP endpoint.

vlowther
2018-03-08 22:18
yeah, that. :slightly_smiling_face:

greg
2018-03-08 22:18
and `8091` is your HTTP port on the DRP endpoint.

dave.parker
2018-03-08 22:19
Got it

greg
2018-03-08 22:19
The `default.ipxe` file is rendered with enough to handle discovery and directed boot operations.

greg
2018-03-08 22:20
It is provided by the `discovery` bootenv. It looks like this:

greg
2018-03-08 22:21

greg
2018-03-08 22:21
The loopback addresses are just because it is how I asked for it.

greg
2018-03-08 22:23
It basically looks for ipxe files on the DRP endpoint for the machine specifically (`<ip>.ipxe` and `<mac>.ipxe`). Bootenvs are supposed to provide one of those files for IPXE boots.

greg
2018-03-08 22:23
If they don?t exist, then it boots sledgehammer and starts discovery.

dave.parker
2018-03-08 22:23
Ok, cool.

dave.parker
2018-03-08 22:24
I'll give that a try.

dave.parker
2018-03-08 22:31
Hey, that kind of worked.

dave.parker
2018-03-08 22:32
It grabbed sledgehammer and booted into it. But it then set the IP address to 0.0.0.0 and is just looping `Sending discover...` over and over

greg
2018-03-08 22:34
sledgehammer expects DNS domain in the DHCP options, I think.

greg
2018-03-08 22:35
You can log into sledgehammer from the console with root/rebar1.

greg
2018-03-08 22:35
To see what DHCP client is getting upset about.

greg
2018-03-08 22:35
That maybe where you are.

dave.parker
2018-03-08 22:43
I can't get in because it just keeps trying to send discover. I never get a login prompt

dave.parker
2018-03-08 23:07
So the sledgehammer image is always going to try configure the network with DHCP? So if I can't have a DHCP server on the network (or there's one there I don't control) will sledgehammer not work?

greg
2018-03-08 23:12
It needs DHCP and what you have should work. They are a couple of things going on.

greg
2018-03-08 23:13
We made a change to let dhcp and drpcli run separately in sledgehammer. This is good, but may be preventing drpcli from running. It is a new change. Probably a bug.

greg
2018-03-08 23:13
The second is that our dhcp server requires 1 options (I think). Dns Domain.

greg
2018-03-08 23:14
We can look at removing that requirement.

greg
2018-03-08 23:18
We are just a little busy with some other things at the moment.

dave.parker
2018-03-08 23:23
I don't have any subnets configured on my dr-provision server currently because I'm testing these remote installs. Do I need one configured anyway?

greg
2018-03-08 23:24
No. It is should be fine.

greg
2018-03-08 23:25
There are bugs for your use case. It would be nice to know what options your dhcp server is sending

dave.parker
2018-03-08 23:40
I don't even see anything on the server end. Nothing in the logs at all.

dave.parker
2018-03-08 23:41
Oh well I'm going to come back to this tomorrow.

zerocarbthirty
2018-03-09 14:27
has joined #json

spector
2018-03-09 14:29
Welcome @zerocarbthirty $

spector
2018-03-09 14:30
sorry, $Welcome

2018-03-09 14:30
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

zerocarbthirty
2018-03-09 14:35
Are there shims or example entrypoints for DRP that show things like: "This is where you put your script that SSH's into a switch and configures ip dhcp-helper on an interface so that a machine will pxeboot when it reboots" or "this is the URL where DRP can retrieve information about the network interfaces for a machine" or are defining all of the tasks left to the user?

spector
2018-03-09 14:45
Thanks for the question, I am on our short morning meeting with engineering, they will be checking this channel shortly.

vlowther
2018-03-09 15:30
@zerocarbthirty We provide several common tasks as part of the community content

vlowther
2018-03-09 15:32
For example, getting the information about network interfaces for a machine is done as part of initial system discovery, which runs the gohai task.

vlowther
2018-03-09 15:33
We don't have one that configured the DHCP helper on a switch -- our usual assumption is that everything is using DHCP all the time anyways, and that the networking guys don't want us touching their switches.

shane
2018-03-09 15:38
@zerocarbthirty - If you have the admin access to the switches in question - and if you have existing tooling to be able to implement those changes, you can build a Stage that would make switch port changes as part of the Workflow aspects of Digital Rebar Provision

vlowther
2018-03-09 15:39
More generally, we expect that each environment will need some customized content (tasks, stages, etc) and workflows to accomplish your deployment goals.

shane
2018-03-09 15:47
however, as @vlowther states, we don't have precanned components in the community tooling that does that - there is a bit of a chicken-and-egg issue, if you are not DHCPing your host against DRP, you'd need to build Reservations for the machine in advance, and add a machine in advance of it booting against us - and your first Stage would be to configure the Switch port to dhcp relay - then power on

shane
2018-03-09 15:48
you either have to have DHCP and our Discovery mechanisms to add machines in to DRP - or you have to build the information in advance to manage the Machine - then make the switch port changes and boot the machine for provisioning

mark.yeun
2018-03-09 17:36
has joined #json

shane
2018-03-09 17:36
@mark.yeun $welcome

2018-03-09 17:36
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

mark.yeun
2018-03-09 17:37
howdy, very cool stuff you're doing :slightly_smiling_face:

shane
2018-03-09 17:38
thanks !

mark.yeun
2018-03-09 17:38
i hope you don't mind me just popping in to ask questions...

vlowther
2018-03-09 17:40
Well, since you ask nicxely. :slightly_smiling_face:

mark.yeun
2018-03-09 17:42
:slightly_smiling_face: I've got dr-provision working nicely in a libvirt environment. I'm trying to get it working on metal. I have the server set up, dhcp relay is working. I have a serial console to my baremetal box. When I pxeboot, I see on tcpdump that dhcp works, and tftp for lpxelinux.0 works.

mark.yeun
2018-03-09 17:42
then... nothing

mark.yeun
2018-03-09 17:42
I believe the next thing that should happen is another tftp for a series of files, which should fail, then a tftp for pxelinux.cfg/default

shane
2018-03-09 17:42
fw or iptables blocking ports 8091 and 8092 on the DRP endpoint ?

mark.yeun
2018-03-09 17:43
wide open

vlowther
2018-03-09 17:43
Yep, that is what you should see.

mark.yeun
2018-03-09 17:45
so my theory is lpxelinux.0 doesn't like my hardware?

vlowther
2018-03-09 17:45
What version of dr-provision are you running, what hardware are you testing, and is it booting via UEFI or legacy BIOS?

shane
2018-03-09 17:45
...and what does your serial console on your Machine show ?

mark.yeun
2018-03-09 17:45
i _think_ it's legacy BIOS.

mark.yeun
2018-03-09 17:46
``` $ dr-provision --version dr-provision2018/03/09 17:45:45.087689 Version: v3.7.3-tip-5-eb82a0429c7c94bb1885cc32528c15e376417138 ```

vlowther
2018-03-09 17:46
Cool.

mark.yeun
2018-03-09 17:46
and I don't have access to vga -- only serial console

vlowther
2018-03-09 17:46
No worries.

shane
2018-03-09 17:46
vga is only for winders ... a real OS only needs a serial console ...

vlowther
2018-03-09 17:46
heh

mark.yeun
2018-03-09 17:46
lol nice

mark.yeun
2018-03-09 17:47
do you want to see the subnet def?

mark.yeun
2018-03-09 17:47
nothing special

vlowther
2018-03-09 17:47
What does the subnet definition looklike?

vlowther
2018-03-09 17:47
yes. :slightly_smiling_face:

mark.yeun
2018-03-09 17:47
haha ```{ "ActiveEnd": "10.10.24.171", "ActiveLeaseTime": 60, "ActiveStart": "10.10.24.151", "Available": true, "Description": "", "Enabled": true, "Errors": [], "Meta": {}, "Name": "kube1_subnet", "NextServer": "", "OnlyReservations": false, "Options": [ { "Code": 1, "Value": "255.255.254.0" }, { "Code": 3, "Value": "10.10.24.1" }, { "Code": 6, "Value": "10.40.20.101,10.40.20.102" }, { "Code": 15, "Value": "http://tower-research.com" }, { "Code": 28, "Value": "10.10.25.255" } ], "Pickers": [ "hint", "nextFree", "mostExpired" ], "Proxy": false, "ReadOnly": false, "ReservedLeaseTime": 7200, "Strategy": "MAC", "Subnet": "10.10.24.0/23", "Unmanaged": false, "Validated": true } ```

vlowther
2018-03-09 17:48
That is fine for 3.7.3

vlowther
2018-03-09 17:49
I wanted to make sure it wasn't forcing the wrong bootloader or something like that.

vlowther
2018-03-09 17:50
Can you throw a screenshot of the failed boot into the channel?

mark.yeun
2018-03-09 17:50
it's blank

vlowther
2018-03-09 17:50
...

mark.yeun
2018-03-09 17:50
serial

mark.yeun
2018-03-09 17:50
so F12, then blank

vlowther
2018-03-09 17:51
so the serial console shows the nic firmware doing its thing, loads lpxelinux.0, then goes blank?

shane
2018-03-09 17:51
^^^

mark.yeun
2018-03-09 17:51
yessir

shane
2018-03-09 17:51
you need to set serial console on the DRP side

mark.yeun
2018-03-09 17:51
actually, as usual, my bios is blanking the screen after POST

vlowther
2018-03-09 17:52
hm

mark.yeun
2018-03-09 17:52
I've set serial console, and I see it in the rendered pxelinux.cfg/default

mark.yeun
2018-03-09 17:52
but i'm not getting that far

vlowther
2018-03-09 17:52
can whatever you are serial cnsoling into the system cwith capture the output?

vlowther
2018-03-09 17:52
ah

vlowther
2018-03-09 17:52
Yeah, I suspect it is what shane mentioned then.

mark.yeun
2018-03-09 17:52
lpxelinux.0 doesn't download the config file, so it doesn't know to output on serial

vlowther
2018-03-09 17:53
We should make having a serial console enabled the default one if these fine days.

mark.yeun
2018-03-09 17:53
and even if it did, the rendered pxelinux.cfg/default has console=... for the kernel but not for itself

mark.yeun
2018-03-09 17:54
```$ tftp REDACTEDHOSTNAME -c get pxelinux.cfg/default $ cat default DEFAULT discovery PROMPT 0 TIMEOUT 10 LABEL discovery KERNEL sledgehammer/9743e672ff33179cd5218d8fe506c03cf2a31d18/vmlinuz0 INITRD sledgehammer/9743e672ff33179cd5218d8fe506c03cf2a31d18/stage1.img APPEND rootflags=loop root=live:/sledgehammer.iso rootfstype=auto ro liveimg rd_NO_LUKS rd_NO_MD rd_NO_DM provisioner.web=http://10.40.20.30:8091 -- console=ttyS1,115200n8 IPAPPEND 2 ```

greg
2018-03-09 17:54
Another thing to try would be to set option 67 in the subnet to `ipxe.pxe` and see if that works.

shane
2018-03-09 17:54
is your serial console *actually* on ttyS1 ? not ttyS0 ?

shane
2018-03-09 17:54
ttyS1 is kinda non-standard - it's what http://packet.net baremetal systems use by default

mark.yeun
2018-03-09 17:54
tried both :slightly_smiling_face:

mark.yeun
2018-03-09 17:55
i tried this, no change in behavior `drpcli subnets set kube1_subnet option 67 to '{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}lpxelinux.0{{else}}bootx64.efi{{end}}'`

vlowther
2018-03-09 17:55
wow, that old string is still out there.

vlowther
2018-03-09 17:56
try {{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}ipxe.pxe{{else}}ipxe.efi{{end}}

shane
2018-03-09 17:56
are you using a USB to Serial converter, or an actual 9-pin serial port ?


mark.yeun
2018-03-09 17:57
actually i have an avocent

shane
2018-03-09 17:57
um ... that's my fault ...

mark.yeun
2018-03-09 17:58
ok trying that option67 value

mark.yeun
2018-03-09 17:59
``` { "Code": 67, "Value": "try {{if (eq (index . 77) \"iPXE\") }}default.ipxe{{else if (eq (index . 93) \"0\")}}ipxe.pxe{{else}}ipxe.efi{{end}}" } ``` gonna give it a go

greg
2018-03-09 17:59
too much

greg
2018-03-09 17:59
get rid of try

vlowther
2018-03-09 17:59
there is no try. :slightly_smiling_face:

mark.yeun
2018-03-09 17:59
lol

mark.yeun
2018-03-09 17:59
do or do not

zerocarbthirty
2018-03-09 17:59
What backend system does DRP use to store information gathered with gohai/etc ?

greg
2018-03-09 18:00
This command: `drpcli subnets set kube1_subnet option 67 to '{{if (eq (index . 77) "iPXE") }}default.ipxe{{else if (eq (index . 93) "0")}}ipxe.pxe{{else}}ipxe.efi{{end}}'`

greg
2018-03-09 18:00
It uses a file-based data store.

greg
2018-03-09 18:00
@zerocarbthirty

zerocarbthirty
2018-03-09 18:01
Oh, I figured it was using Collins or something like that.

greg
2018-03-09 18:01
@zerocarbthirty - we try to keep it minimal / lightweight for embedding in things.

vlowther
2018-03-09 18:01
Nope. We want complete standalone operation + a zero-dependency install.

greg
2018-03-09 18:02
We can use different backend but they need to be more blob store like than Collins.

vlowther
2018-03-09 18:02
so the dr-provison binary embeds all the things it must have to install.

zerocarbthirty
2018-03-09 18:05
So I asked the question earlier about having DRP configure switchports for DHCP when it needs to pxeboot a system rather than having DHCP always on. I've been watching alot of the videos about DRP (that use packet as infrastructure) and it seems a little odd that a network of that size would want every port forwarding DHCP traffic from every host to the DHCP server all the time. Although I guess they probably don't use DR for their own DHCP/pxe so maybe they do actually configure the ports.

shane
2018-03-09 18:06
yes, they configure switch ports dynamically for every single host - every single host by default is isolated in a /31 size L3 boundary and VXVLAN separated (updated my typo to /31 boundary)

shane
2018-03-09 18:06
Digital Rebar Provision can configure ports just like they do ... it would be part of the Stages of workflow that you'd write for content

mark.yeun
2018-03-09 18:06
lol i thought you were making a yoda joke but I get it now. removed the "try ", retrying

shane
2018-03-09 18:09
the open source Digital Rebar Provision out-of-the-box does not have switch management built in

mark.yeun
2018-03-09 18:10
boys i see many packets


zerocarbthirty
2018-03-09 18:11
Hmm I was under the impression that they didn't actually use any L2 VLAN/VXLAN stuff so that people can use their own overlays.

zerocarbthirty
2018-03-09 18:12
but it's amusing to me that they use /31s just because of NANOG's history of arguing about whether or not anyone should ever use /31s haha

mark.yeun
2018-03-09 18:12
i forgot to put serial console back on ttyS0, but the machine showed up in drpcli machines list

mark.yeun
2018-03-09 18:13
will give it another crack with proper serial

mark.yeun
2018-03-09 18:13
but you are the _man_

mark.yeun
2018-03-09 18:13
seems I'd have been stuck without the slack inquiry

mark.yeun
2018-03-09 18:13
thanks guys! much appreciated.

greg
2018-03-09 18:14
What type of hardware is it? @mark.yeun

mark.yeun
2018-03-09 18:14
supermicro X10 series with mellanox 10g

shane
2018-03-09 18:14
hmm - the mellanox drivers may not be in the open source centos/ubuntu - have you checked that ?

greg
2018-03-09 18:14
lpxelinux.0 might not like the mellanox. `ipxe.pxe` is ?newer?

shane
2018-03-09 18:14
I know they were a problem (especially Ubuntu 14.x)

shane
2018-03-09 18:15
which Mellanox cards ?

zerocarbthirty
2018-03-09 18:16
Anyway, earlier I wasn't asking about whether DRP has the ability to configure networking equipment I was asking whether there were example workflows that illustrate where one might just pop in their own code to look stuff up from IPAM/configure switches, etc

mark.yeun
2018-03-09 18:17
02:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

shane
2018-03-09 18:17
should be good - that's an older gen card - which I think is supported in the mlx4 driver

shane
2018-03-09 18:18
(it might be in the mlx5 driver - but both are in CentOS7 by default ... pretty sure ubuntu 16 too)

mark.yeun
2018-03-09 18:18
one more quick question -- I haven't looked into this yet. is IPMI support behind the pay wall?

vlowther
2018-03-09 18:18
Yep.

zerocarbthirty
2018-03-09 18:19
Hmm, I think crowbar used to manage ipmi

zerocarbthirty
2018-03-09 18:19
odd that they would decide to make that a paid for feature

greg
2018-03-09 18:20
We have to have some encouragement to move to revenue generating customers.

mark.yeun
2018-03-09 18:20
okay, if this sexy thing keeps looking sexy i'll hit up your sales team for at least ballpark pricing :slightly_smiling_face:

greg
2018-03-09 18:21
You can play with it in the content bundles of IPMI.

greg
2018-03-09 18:21
@mark.yeun - that is amazingly easy. Seeing as you?ve talked with over half the team. :slightly_smiling_face:

mark.yeun
2018-03-09 18:21
okay that triggers one more question. what sort of features do we get with the paywall bios support?

vlowther
2018-03-09 18:22
Dell firmware updates and BIOS config are there presently

2018-03-09 18:22
Time to feed the :bear:!

vlowther
2018-03-09 18:22
seeing as how that is what we have the most experience.

zerocarbthirty
2018-03-09 18:23
@mark.yeun You should be able to just boot into a thin linux environment like Alpine and then use whatever vendor's tools to configure your BIOS by just having alpine curl a script that builds the config based on either the MAC or IP of the client.

zerocarbthirty
2018-03-09 18:24
omconfig is especially easy to do that with

vlowther
2018-03-09 18:24
I have implemented Supermicro support (via sum) for DRv2 and an earlier customer, and have been waiting for demand to port that support over to dr-provision.

mark.yeun
2018-03-09 18:24
@zerocarbthirty thanks, we have such stuff for our old heavy OS builds. lots of witchcraft though with our mixed hardcware platforms

zerocarbthirty
2018-03-09 18:24
I don't know how far along redfish is but some of that may be standardized now

greg
2018-03-09 18:25
@zerocarbthirty - or you could use our tools and workflows to automate and orchestrate. Or build a task/stage that runs your own tools as part of the process. DRP is flexible.

vlowther
2018-03-09 18:25
ya, our BIOS and RAID plugins go through an in-house tool that implements a standard format and idempotent config flow on top of vendor-provided tooling.

shane
2018-03-09 18:25
@zerocarbthirty...it all sounds so _easy_ on paper .... :stuck_out_tongue_winking_eye:

vlowther
2018-03-09 18:26
yeah, we take care of blundering into and padding all the sharp bits so you don't have to.

zerocarbthirty
2018-03-09 18:27
and i'm guessing that also Windows is paywalled too?

vlowther
2018-03-09 18:27
so you can (for instance) hand us a JSON blob representing a RAID config and we take care of driving megacli/storcli/whatever to make that config happen.

vlowther
2018-03-09 18:27
Ditto for BIOS config.

zerocarbthirty
2018-03-09 18:28
You mean in the event that we have access to the rackn stuff

greg
2018-03-09 18:28
@zerocarbthirty - yes, image-based deploys are paywalled. Because they are tricky usually require consulting.

zerocarbthirty
2018-03-09 18:29
oh, we have it down to where we boot into PE and use command-line DISM to do phase 1 then the system reboots and finishes the install. We didn't find it very complicated even using pxelinux

greg
2018-03-09 18:30
Okay - then that is good for you. We?ve found that building images and deploying them is much faster and our customers like that workflow better. It fits into CICD pipelines better and gives better controls.

zerocarbthirty
2018-03-09 18:30
I assume you are referring to non .wim images?

greg
2018-03-09 18:31
Correct - though I have prototypes that deploy those as well.

zerocarbthirty
2018-03-09 18:34
So you are referring to images that you drop onto the disk when the install is past what is commonly referred to as the 'PE' phase?

mark.yeun
2018-03-09 18:34
hey I want to thank you guys again. will go have a play, and may come back for more magic.

vlowther
2018-03-09 18:35
@zerocarbthirty: yes.

vlowther
2018-03-09 18:36
@mark.yeun: No problemo.

zerocarbthirty
2018-03-09 18:41
That is interesting, I haven't really looked for different imaging formats than .wim since we used ghost+floppies to do installs but I suppose if you include the right drivers or only use hardware that has native drivers in $target_windows_version that method could be a bit faster.

shane
2018-03-09 18:42
It is extremely fast - only requires one reboot of the machine to provision to a completed OS, and we support adding in post-provisioning bootstrap config changes as well

shane
2018-03-09 18:42
along with our Agent, which you can enable longer term lifecycle management (if desired, by default our Agent "dissolves" after initial provisioning" )

shane
2018-03-09 18:43
coupled with a CI/CD pipeline to validate/test your "gold" image, you can roll out patch updates very very quickly across very large scale infrastructure

shane
2018-03-09 18:44
you also can roll forward/roll back images quickly through this mechanism in the event a new image exhibits behavior problems "in the wild"

zerocarbthirty
2018-03-09 18:45
So, pardon me if I am nosy but you would take the image when it is at the "Please wait while we are setting everything up for you" phase which I believe they still call OOBE and then have an execution configured to grab a script in powershell or (whatever) to execute the changes?

zerocarbthirty
2018-03-09 18:47
I suppose one could also just roll cloud-init into an imagine for physical hosts

zerocarbthirty
2018-03-09 18:47
err image

greg
2018-03-09 18:48
Hmmm. Maybe. :grinning:

zerocarbthirty
2018-03-09 18:50
although you would want to avoid joining the domain with such a host until after the computer name is changed for obvious reasons

greg
2018-03-09 18:50
If that is a concern. Most definitely

zerocarbthirty
2018-03-09 18:57
Discovery uses open sledgehammer? does that do cdp/lldp to gather network info?

shane
2018-03-09 18:58
lldp is in the Sledgehammer image so you can definitely build a Stage to collect switch port info

zerocarbthirty
2018-03-09 18:59
Cool, i'm having the team install 40 PowerEdge 440s to play with DRP next week so I'm just trying to think of everything I am going to wonder about ahead of time.

shane
2018-03-09 19:00
Stages can be used to integrate in to external DCIM/Asset Mgmt/Cfg Mgmt databases as well - we can either push inventory info to them, or pull info to build provisioning decisions against

shane
2018-03-09 19:00
(and IPAM as well)

zerocarbthirty
2018-03-09 19:03
The problem I find mostly is that the team will rack 500 servers and then make a bunch of mistakes describing them whether it's the actual specs, the network info, whatever. so thats really the part that is the biggest pain in the ass. I didn't realize this until today but Packet never changing their hardware after it is installed makes alot of things a whole lot easier.

shane
2018-03-09 19:03
yep, I know that pain ...

shane
2018-03-09 19:04
correlating the physical infrastructure design from what it _should_ be to what it really is (or isn't; if it's broken) ... can be a mess

shane
2018-03-09 19:04
you can audit via Sledgehammer what the reality of your network ports are and compare that to a "it should be this" design

shane
2018-03-09 19:05
we use the LLDPD implementation which supports LLDP, CDP, EDP, SONMP, and FDP protocols

zerocarbthirty
2018-03-09 19:05
Well, if you devote a portion of your DC to being totally 'fixed' i.e. you won't change the specs/wont change the network, etc it makes alot of that trivial

zerocarbthirty
2018-03-09 19:06
but if you are constantly adding/removing drives+ram+pci-e cards, etc

zerocarbthirty
2018-03-09 19:06
your inventory is going to get out of date pretty quickly

shane
2018-03-09 19:07
one could implement use of our Agent (i.e. not let it dissolve), and built in `gohai` inventory to report back to your provisioning service or other external services on a periodic basis

shane
2018-03-09 19:08
then you can support continual sweeping of inventory management - these are some of the larger lifecycle management solutions that can be built around the Agent if desired

zerocarbthirty
2018-03-09 19:08
my initial thought on that was to have a maintenance mode that would cause it to boot back into the discovery to update the inventory but then I just figured out that we could probably just make a certain percentage of stuff not changeable

shane
2018-03-09 19:09
Sledgehammer is designed to support that use, and DRP is designed to let you in-memory/live boot systems "do stuff", then boot them back to their installed OS easily enough

shane
2018-03-09 19:09
but that's a fairly disruptive pattern to have ...

zerocarbthirty
2018-03-09 19:09
I always find it humorous that if you add RAM to a dell server it takes a new inventory but there is no way to tell it to send that information anywhere

2018-03-09 19:09
Time to feed the :bear:!

shane
2018-03-09 19:10
use of the Agent reporting back to DRP can centralize and manage that information as opposed to the rebooting - our Agent is crosscompiled for Arm, Intel, 32, 64bit, Linux, Windows, and Darwin (Mac) currently - pretty easy to port to other things if desired

zerocarbthirty
2018-03-09 19:10
Oh, yeah man but it's not like you are going to upgrade RAM on a server while it's running

greg
2018-03-09 19:10
I would call those workflows. :grinning:

zerocarbthirty
2018-03-09 19:11
so there will be some amount of disruption anyway

zerocarbthirty
2018-03-09 19:11
and if it's something that can't be disrupted it should be running on block storage that is accessible from more than a single host anyway

shane
2018-03-09 19:12
that is a good modern design pattern, but sadly, not all shops are there ...

zerocarbthirty
2018-03-09 19:15
Yeah earlier I was moslty just asking if there were example workflows that are just missing the rackn pieces

zerocarbthirty
2018-03-09 19:15
like "this would be where rackn did this cool thing"

zerocarbthirty
2018-03-09 19:15
that would make it easy to figure out what pieces we have/need to build/want to buy

zerocarbthirty
2018-03-09 19:23
You guys have piqued my interest on this windows image thing, must find file format :smiley:

shane
2018-03-09 19:27
this Tuesday we will be demo'ing our Image Deployment capabilities - including the Windows Image deploy on the weekly Meetup

greg
2018-03-09 19:33
@zerocarbthirty - The default community content has stages and tasks in it. You can see the tasks / stages in the RackN content that is available by logging into the RackN SaaS.

shane
2018-03-09 19:33

zehicle
2018-03-09 22:47
When you create an account, you'll have access to some licensed content on a trial basis from the catalog.

zehicle
2018-03-09 22:48
The account turns on "registration wall" UX features.

zehicle
2018-03-10 03:03
little bonus features.... added live Task view to Bulk Edit

zehicle
2018-03-10 03:03

zehicle
2018-03-10 03:03
It will be in the latest after we test it a bit

jweber
2018-03-11 16:57
has joined #json

spector
2018-03-11 16:58
$Welcome

2018-03-11 16:58
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

florent.wagener
2018-03-12 14:08
hey guys, quick question, I have 3 burnin jobs that are stuck in a "running" state and I can't delete them. How can I force them into a fail state ?

florent.wagener
2018-03-12 14:08
I've tried this: `drpcli jobs destroy ce9c668a-7888-4308-84c6-e1591deedc0d --force` without success :disappointed:

florent.wagener
2018-03-12 14:10
nevermind I find it: `drpcli jobs update ce9c668a-7888-4308-84c6-e1591deedc0d '{"State": "failed"}'`

amit.handa
2018-03-13 06:41
am trying to setup kubernetes cluster on virtualbox VMs (3 VMs)

amit.handa
2018-03-13 06:41
have completed kubernetes setup via kubespray playbooks

amit.handa
2018-03-13 06:42
however, unable to access the dashboard

amit.handa
2018-03-13 06:42
it says 'unauthorized'. Any idea how to fix it ?

amit.handa
2018-03-13 06:42
Thank you !

amit.handa
2018-03-13 06:42
``` { "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "Unauthorized", "reason": "Unauthorized", "code": 401 } ```

greg
2018-03-13 12:56
@amit.handa - I think you need some kubeproxy magic, but not sure. Something like this may help you start. https://github.com/kubernetes/dashboard/issues/692

amit.handa
2018-03-13 12:58
thanks greg. Let me look for cure

shane
2018-03-13 13:11
@amit.handa are you trying to access via the `kubectl proxy` command ? http://provision.readthedocs.io/en/tip/doc/integrations/krib.html#kubernetes-dashboard-via-proxy

amit.handa
2018-03-13 13:12
I am opening it via https://<kube-master-ip>:6443

amit.handa
2018-03-13 13:12
as mentioned in the drp docs

wayneeseguin
2018-03-13 14:00
has joined #json

zehicle
2018-03-13 15:26
@amit.handa if kubespray changed security defaults then the docs will be out of date. The integration just supplies the inventory. It's likely they tweaked the auth system

shane
2018-03-13 15:27
@wayneeseguin $welcome

2018-03-13 15:27
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

amit.handa
2018-03-13 15:58
thanks @zehicle for the info, I am new to kubernetes. ll check and update.

amit.handa
2018-03-13 17:18
thanks @greg

amit.handa
2018-03-13 17:19
I stopped the docker proxy container and ran following on master node

amit.handa
2018-03-13 17:19
```kubectl proxy --address 0.0.0.0 --accept-hosts '.*'```

amit.handa
2018-03-13 17:19
I can see the dashboard

jpresley
2018-03-13 17:41
has joined #json

jpresley
2018-03-13 17:57
I'm exploring use of digital rebar to provision bare metal machines in the office and in the data center. Is it a lot of effort use digital rebar when the local network already has a dhcp server? My experience is more in devops and software provisioning rather than traditional ops

zehicle
2018-03-13 18:59
@jpresley there are several ways to handle shared DHCP including setting nextboot from your DHCP and not using DRP and using DRP as a DHCP Proxy to add nextboot instructions to DHCP requests

zehicle
2018-03-13 18:59
I think some of those are in the $faq


wdennis
2018-03-13 19:13
@jpresley I am using DRP on a subnet with existing DHCP server - I just am setting ?next server? and ?default file name? params to appropriate values

greg
2018-03-13 19:30
@jpresley - proxy dhcp may work for you

lcrozzoli
2018-03-13 23:11
has joined #json

shane
2018-03-14 00:49
@lcrozzoli $welcome

2018-03-14 00:49
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

wayneeseguin
2018-03-14 13:36
:smile:

zehicle
2018-03-14 14:52
Good morning @wayneeseguin

lcrozzoli
2018-03-14 15:13
Hello and thanks to all. i'm very glad to be here

nkabir
2018-03-14 16:08
Hello, all. I've managed to navigate the documentation well enough to create a bundle, upload it, and successfully provision machines. It works beautifully and I appreciate everyone's hard work. I have two questions about the process: 1) I'd like to customize the ubuntu preseed "late_command" but cannot find references to it in the documentation. Nor could I find references to it in the standard collection of parameters (I used "select-kickseed" to customize other parts of the installation which worked perfectly). What is the preferred way to override "late_command"? 2) I have a "discover -> prep-install -> ubuntu-16.04-install -> finish-install -> complete-nowait" workflow to accomplish this. Is this reasonable? Apologies if I should open these questions as Github issues. I know it's always a challenge to document every possible use case. I'd like to help out any way I can so I'd be happy to post these questions as Github issues so they can be organized/triaged.

shane
2018-03-14 16:09
hi @nkabir - glad to hear you've gotten things rolling !

shane
2018-03-14 16:12
taking a look at your late_command question

shane
2018-03-14 16:17
currently - there are two paths for you: 1. with `select-kickseed` - you can define you're own post install script - either replacing the `post-install.sh` call, or adding a second line after it ... 2. you can clone the ubuntu-16.04-install BootEnv, and make the `template` call changes in that - using your cloned BootEnv

shane
2018-03-14 16:17
However, if you clone the BootEnv, when we make DRP Community Content updates, you won't get those changes for that cloned BootEnv - but the original ubuntu-16.04-install BootEnv will be updated

shane
2018-03-14 16:20
it looks like we actually call "post-install.sh" twice (erroneously) - we call it once in the BootEnv definition in the `templates` section, then again in the `net-seed.tmpl`

shane
2018-03-14 16:25
since you are already using `select-kickseed`, the `net-seed.tmpl` that you (presumably) originally cloned can be modified to point to a different `post-install.sh`; and our `post-install.sh` will still run, as it's defined by the BootEnv definition to run last - and you don't want to disturb the `reset-workflow` and `runner` template calls in that, otherwise you'll get bad behavior with the Workflow/Stages and jobs.

nkabir
2018-03-14 16:28
Thank you for the clarification, @shane. I'll give that a try!

shane
2018-03-14 16:29
Also - if you are unsure exactly what a template is going to do - you can render it to see the final product - not everything can be rendered in it's final state, since the context in which it was called matters in some cases

shane
2018-03-14 16:29
but there is a $faq on test rendering templates


shane
2018-03-14 16:31
the rendering works for other things too, any template defined in a BootEnv can be rendered against the "machines" url, example: http://drp.domain.com:8091/machines/817cbf29-30be-4807-b5a0-1234567890/post-install.sh

shane
2018-03-14 16:32
the trick is ... the machine must be in the Stage that the templates are defined in - so the `ubuntu-16.04-install` Stage - defines templates, put a "phantom" machine in to that bootenv, and then render away

shane
2018-03-14 16:32
this is true for other stages too - I use a phantom machine to render stuff all the time

nkabir
2018-03-14 16:37
I was able to get that far. I wanted to work within DRP preferred conventions and ensure I didn't customize too far afield of the tool's best-practices. Would my custom post-install.sh script need to follow the BootEnv post-install script's configuration of the chroot environment i.e. using a here-doc and executing in /target?

shane
2018-03-14 16:41
I'd suggest cloning the existing `net-post-install.sh.tmpl`, and work with in the same conventions - just modify in the HereDoc what you want to actually perform - and change the name of the written out file from `update_system2.sh` to something like `update_norms_secret_post_install.sh` - otherwise keeping everything else intact around the HereDoc

nkabir
2018-03-14 16:41
:+1: will give that a try!

shane
2018-03-14 17:35
@nkabir - additionally you could add a new Stage, with a Task `norms-post-install`, which calls the new `post-install` script - you'd then insert that in your work flow between `ubuntu-16.04-install` and `finish-install`

nkabir
2018-03-14 19:36
@shane I like your suggestion better. It's more discoverable and self-documenting. Thanks!

shane
2018-03-14 19:36
The Stage/Task/Template solution?

shane
2018-03-14 19:38
^^^ was @greg and @vlowther reminding me that is the better path to follow ... :slightly_smiling_face:

wdennis
2018-03-15 00:55
@shane Speaking of stages and tasks, I'm having a problem with a custom Stage/Task as follows...

wdennis
2018-03-15 00:56
I made a custom stage based on `prep-install` named `necla-prep-install` which is as follows:

wdennis
2018-03-15 00:56

wdennis
2018-03-15 00:57
This uses the Task I named `totally-erase-sda` which is as follows:

wdennis
2018-03-15 00:58

wdennis
2018-03-15 00:58
I wired the stage up in a stage-map in my profile as follows:

wdennis
2018-03-15 00:59

wdennis
2018-03-15 01:01
So, as I understand it, this will boot SH, then execute the template script in the task called by the `necla-prep-install` stage, which should do the `dd` command which will zero out /dev/sda. After this completes, the system will reboot, and go into the Ubuntu 16.04 install.

wdennis
2018-03-15 01:02
The systems calling the profile with the stage map are booting SH, but then nothing is executing (no `dd` is happening.) Why might this be?

wdennis
2018-03-15 01:06
I should also mention that `necla-prep-install` is a modified clone of `prep-install`, and `totally-erase-sda` is a modified clone of `erase-hard-disks-for-os-install`

wdennis
2018-03-15 01:08
The prior stage map was calling the `prep-install` Stage, which was successfully executing the template in `erase-hard-disks-for-os-install`, but the disk wipe was not sufficient, and the Ubuntu install thereafter was failing.

wdennis
2018-03-15 01:09
I have found that if I do a `dd if=/dev/zero of=/dev/sda bs=512` then all goes smoothly with the resulting install

shane
2018-03-15 02:23
not sure off the top of my head, I'm out at dinner right now - will look at this a little bit later

wdennis
2018-03-15 14:14
Mornin', DR folk!

wdennis
2018-03-15 14:15
Any ideas on my problem above? ^^^

shane
2018-03-15 14:20
sorry - had client work I was dealing with didn't get a chance to look at it ... will try and take a peek soon

wdennis
2018-03-15 14:21
@shane Thx

wdennis
2018-03-16 00:59
OK, I give up... Pretty sure it's not a DRP problem (except for the disk wiping issue not working as above) - dd'd the disks with zeros manually and the Ubuntu preseed still crapping out. Going to try MAAS and see if I have a different experience.

zehicle
2018-03-16 01:04
I believe Ubuntu writes the partition tables in a way that is hard to undo. Greg has had to fix it for other people. I don't know the details.

wdennis
2018-03-16 01:07
@zehicle Zeroing the disk should overwrite all of that, yes? I don't believe I ever ran into this problem though using Cobbler to reinstall previously-installed machines, and of course doing a manual install via USB on a previously-used disk work fine...

wdennis
2018-03-16 01:08
It's just that I have machines to reinstall, and I can't get DRP to work reliably on used disks...

zehicle
2018-03-16 01:24
There is data written in places you can't easily find. Sorry I don't have the details. It's a Ubuntu install issue.

zehicle
2018-03-16 01:33
I'm not assuming that I google better than you -> there's something about how the partition table is written.

zehicle
2018-03-16 01:34
It took Greg a while to fix it before (I remember him describing it as "blah Ubuntu blah Drive Paritions blah install turds blah" so I know it's a thing.

zehicle
2018-03-16 01:34
Greg never says turds, that's my color commentary

nkabir
2018-03-16 02:12
@wdennis I am running Ubuntu 16.04 installs (and re-installs) on a set of machines to familiarize myself with the tool. I started with "discover (start) -> prep-install (reboot) -> ubuntu-16.04-install (reboot) -> finish-install (reboot) -> complete-nowait (success)" before venturing into custom stages. To restart the process, I'm executing "# dd if=/dev/zero of=/dev/sda bs=1024 count=1", resetting the DRP Machine entry stage to "discover", and rebooting. It appears to re-install successfully. Have you managed to get a stock life-cycle working? I've noticed it's easy for me to make typos in "shell + template" files. I started out with MAAS until I discovered DRP. I prefer DRP.

wdennis
2018-03-16 13:53
@nkabir Yes, your workflow was just about exactly like mine -- but I find that `prep-install` doesn't do enough to clear the disk for re-use (I'm using LVM, and I'm pretty sure that's causing the issue.) I have done "stock" DRP-provided template installs, and they do work. But, I cannot stick with that due to my partitioning needs (as well as other preseed needed customizations.) I realize that RackN doesn't support debugging preseed problems, but I'm loosing too much time on this trying to figure it out myself... So I'm going to do a test using MAAS and see what my experience is there. Maybe I'll quickly come back to DRP :wink:

greg
2018-03-16 13:56
`prep-install` seems to work for many. It would be good to see your workflow again @wdennis. I?m out though and will get yelled at for this message.

shane
2018-03-16 13:56
@greg - go back to your vacation

wdennis
2018-03-16 13:57
Wow, that didn't take long...

greg
2018-03-16 13:58
@nkabir - I think you can remove the `prep-install(reboot)` with `prep-install(success)` - one less reboot that way. `finish-install(reboot)` should be `finish-install(stop)`, but with the tip code base it will ?fix? it for you.

mark.yeun
2018-03-16 14:45
hi guys i'm back. I had dr-provision working well with mellanox cards, but now on a machine with SolarFlare, I'm getting stuck.

mark.yeun
2018-03-16 14:46
I have the option 67 override, which I needed for mellanox: `67: "{{if (eq (index . 77) \"iPXE\") }}default.ipxe{{else if (eq (index . 93) \"0\")}}ipxe.pxe{{else}}ipxe.efi{{end}}"`

mark.yeun
2018-03-16 14:47
I have only serial console to the machine I'm trying to provision. the screen goes blank.

mark.yeun
2018-03-16 14:47
i have a tcpdump on the drp box, and it shows tftp transfer of ipxe.pxe, then nothing

mark.yeun
2018-03-16 14:53
Ah I got someone to take a pic of the screen. I have this, and it froze ```PXE->EB: !PXE at (AC0:0790, entry point at 9AC0:0248 UNDI code segment 9AC0:0838, data segment 9B44:1c?? UNDI device is PCI 06:00.0, type gPXE 619kB free base memory after PXE unload Oops! Unable to find realmode segment```

nkabir
2018-03-16 15:00
@greg Thank you. That will save me some time as I iterate through my set up! Now back to enjoying Mai Tais (or equivalent)! @wdennis I am using LVM on a single disk but I haven't ventured into customizing partitioning yet. My staging machines have a single boot disk (`/dev/sda`) and three additional disks that are configured for ZFS/RAIDZ by Ansible once the DRP provisioning sets up the single boot disk. Ansible handles the partitioning of ZFS. I just need the boot disk to host the OS.

mark.yeun
2018-03-16 15:02
hm removing option 67 worked for this box.

mark.yeun
2018-03-16 15:03
so hmm. lpxelinux.cfg freezes my mellanox box and ipxe.pxe freezes my solarflare box

mark.yeun
2018-03-16 15:03
that puts a skunk in the works

shane
2018-03-16 15:03
gotta love the PXE "_standards_" ...

mark.yeun
2018-03-16 15:03
do you have any tricks up your sleeve?

shane
2018-03-16 15:04
@greg or @vlowther might have some iPXE script options to help - unfortunately, I'm not sure how to implement that right to work for both cards

vlowther
2018-03-16 15:09
What nic is in that solarflare box?

mark.yeun
2018-03-16 15:10
solarflare is the nic

vlowther
2018-03-16 15:12
...

vlowther
2018-03-16 15:12
Interesting.

mark.yeun
2018-03-16 15:12
sorry, having trouble with copy/paste

mark.yeun
2018-03-16 15:12
Solarflare SFC9020

mark.yeun
2018-03-16 15:13
I see option 67 has template interpolation -- would we be able to switch on gohai inventory?

vlowther
2018-03-16 15:13
I assume they work fine once we get booted into Sledgehammer?

mark.yeun
2018-03-16 15:13
although that doesn't help for discovery

mark.yeun
2018-03-16 15:13
yup

vlowther
2018-03-16 15:14
Alas, the template interpolation that happens for option 67 only has access to the incoming DHCP packet.

vlowther
2018-03-16 15:16
hm.

vlowther
2018-03-16 15:16
What firmware are those solarflare nics running?

mark.yeun
2018-03-16 15:16
soo, we were pxe booting both types with our own pxelinux.0 via tftp

mark.yeun
2018-03-16 15:17
maybe I can tweak option 67 to have it try pxelinux.0 instead of lpxelinux.0 / ipxe.pxe

vlowther
2018-03-16 15:17
Trolling through the firmware release notes indicate that they have had a few firmware updatres to fix PXE related issues.

mark.yeun
2018-03-16 15:18
trying to dig up the version


vlowther
2018-03-16 15:18
release notes for their latrest utility bundle.

vlowther
2018-03-16 15:19
It looks like they have use an embedded gPXE for their PXE booting needs.

mark.yeun
2018-03-16 15:19
i have this so far, still trying

mark.yeun
2018-03-16 15:19
```Solarstorm Boot Manager (v3.2.0.6061) Solarflare Communications 2008-2010 ```

mark.yeun
2018-03-16 15:20
reading those release notes

vlowther
2018-03-16 15:30
heh, later releases allow you to embed your own custom ipxe ROM image.

vlowther
2018-03-16 15:30
COuld have all sorts of fun with that. :slightly_smiling_face:

mark.yeun
2018-03-16 15:32
ah trying to avoid that kind of fun for now :slightly_smiling_face:

mark.yeun
2018-03-16 15:32
``` Firmware version: v3.2.1 Controller type: Solarflare SFC9000 family Controller version: v3.2.0.6071 Boot ROM version: v3.2.0.6061 ```

mark.yeun
2018-03-16 15:32
quite far behind

vlowther
2018-03-16 15:33
In the mean time, though, if you have a version of pxelinux and/or ipxe that work for both systems in question you could just chuck them into /var/lib/dr-provision/tftpboot and rewrite your option 67 to use them instead.

mark.yeun
2018-03-16 15:35
ok thanks

vlowther
2018-03-16 15:35
or...

vlowther
2018-03-16 15:36
sorry, bad advice.

vlowther
2018-03-16 15:38
Make /var/lib/dr-provision/replace. and name then the same as their respective files in /var/lib/dr-provision/tftpboot

vlowther
2018-03-16 15:39
sorry, I am in the bad habit of just recompiling when I want to test different embedded assets

mark.yeun
2018-03-16 15:40
so mkdir /var/lib/dr-provision/replace; cp pxelinux.0 /var/lib/dr-provision/replace/mypxelinux.0, then change option 67 to send mypxelinux.0

vlowther
2018-03-16 15:41
no, I think it will have to be named lpxelinux.0

vlowther
2018-03-16 15:41
and you will need to include the related .c32 files for pxelinux

vlowther
2018-03-16 15:42
ipxe is easier because it will not try to load binary modules.

vlowther
2018-03-16 15:42
but I bet you will need a firmware update to get ipxe working.

mark.yeun
2018-03-16 15:44
oooh i c. so /var/lib/dr-provision/replace masks /var/lib/dr-provision/tftpboot?

mark.yeun
2018-03-16 15:44
i'm bricking, er, working on upgrading the firmware

vlowther
2018-03-16 15:45
That is the idea, yes. :slightly_smiling_face:

mark.yeun
2018-03-16 15:46
does sledgehammer on serial console have password login?

vlowther
2018-03-16 15:46
root/rebar1

mark.yeun
2018-03-16 15:46
silly me did the firmware update over ssh and of course lost connection

mark.yeun
2018-03-16 15:46
omg thank you

mark.yeun
2018-03-16 15:49
YES! firmware update seems to have done it

mark.yeun
2018-03-16 15:50
@vlowther thank you for heroic advice :slightly_smiling_face:

vlowther
2018-03-16 15:50
:slightly_smiling_face:

vlowther
2018-03-16 15:51
And it looks like the latest firmware has a slew of fixes and enhancements over your older version.

mark.yeun
2018-03-16 16:01
ok i'll come back again when I get stuck

vlowther
2018-03-16 16:02
:slightly_smiling_face:

vlowther
2018-03-16 16:04
@wdennis Is it still the --force --force --I-really-mean-it thing?

vlowther
2018-03-16 16:10
@nkabir our erase script is a little more paranoid than zeroing the first megabyte: https://github.com/digitalrebar/provision-content/blob/master/content/tasks/erase-hard-disks-for-os-install.yaml

vlowther
2018-03-16 16:10
we iterate over all the vgs, forcibly erase them, do the same for all the pvs

vlowther
2018-03-16 16:11
then wipe out the first and last meg of every partition and raw block device.

wdennis
2018-03-16 16:17
wat?

vlowther
2018-03-16 16:21
You had a similar issue a few weeks ago where vgremove was failing because one --force was not enough.

wdennis
2018-03-16 16:23
So my solution was to zeroize the entire disk via `dd if=/dev/zero of=/dev/sda bs=512`

wdennis
2018-03-16 16:23
That should do the trick, right?

vlowther
2018-03-16 16:24
Yes, the main reason I don't want to do that by default is because it can take hours if (for example) you are zeroing a multi-terabyte drive.

vlowther
2018-03-16 16:25
so I would prefer an approach that is a bit more targeted in what it erases

wdennis
2018-03-16 16:25
Except that it doesn't... The Ubuntu install I am trying to do afterwords still craps out...

vlowther
2018-03-16 16:26
possibly with fixes to the seed files for Debianoids that tell the install to really ignore anything that might already be on the disk.

wdennis
2018-03-16 16:26
Yes, I'd prefer a shorter wipe as well, but only if it works...

vlowther
2018-03-16 16:26
but you know all about the lack of documentation there. :confused:

wdennis
2018-03-16 16:27
I was using your `prep-install` stage but was experiencing faults when the installer got to the partitioning step

wdennis
2018-03-16 16:28
The interesting thing is that a standard USB install (i.e. a regular ISO installer) doesn't complain about anything when you use a used disk

wdennis
2018-03-16 16:28
It just partitions however you told it to and moves on

vlowther
2018-03-16 16:28
So it probably knows about seed options we don't

wdennis
2018-03-16 16:28
yes

wdennis
2018-03-16 16:29
I need a specific partitioning to support how we want to configure the boot/root disk

wdennis
2018-03-16 16:30
It's basically:

wdennis
2018-03-16 16:30
``` d-i partman-auto/expert_recipe string root_home_lvm : \ 500 533 1024 free \ $iflabel{ gpt } $reusemethod{ } \ method{ efi } format{ } \ . \ 1024 1024 1024 ext2 \ $primary{ } method{ format } \ format{ } use_filesystem{ } filesystem{ ext2 } \ mountpoint{ /boot } \ . \ 1 1073741824 -1 ext3 \ $defaultignore{ } method{ lvm } \ vg_name{ vg00 } \ . \ 50% 20 100% linux-swap \ $lvmok{ } \ lv_name{ lv_swap } in_vg{ vg00 } \ method{ swap } format{ } \ . \ 204800 204800 204800 ext4 \ $lvmok{ } in_vg{ vg00 } \ lv_name{ lv_root } method{ format } \ format{ } use_filesystem{ } filesystem{ ext4 } \ mountpoint{ / } \ . \ 512000 512000 512000 ext4 \ $lvmok{ } in_vg{ vg00 } \ lv_name{ lv_home } method{ format } \ format{ } use_filesystem{ } filesystem{ ext4 } \ mountpoint{ /home } \ . ```

wdennis
2018-03-16 16:31
So it makes LV's for "/", swap and "/home"

wdennis
2018-03-16 16:31
The idea is to make it partition any size of disk (we have very hetero hardware here...)

tim_epkes
2018-03-16 16:32
has joined #json

wdennis
2018-03-16 16:32
Basically make a reasonable-sized "/" LV, reasonable-sized swap LV, then all the rest of the VG space goes to the "/home" LV

vlowther
2018-03-16 16:34
yes

vlowther
2018-03-16 16:35
it is finding the right partman-lvm and partman-md and other related options to make it really clear everything and not ask questions

wdennis
2018-03-16 16:36
We need that because the researchers get local home dir's on the servers, and tend to pile lots of data in them... If "/home" was a part of the "/" LV, they run the OS partition out of space and crash the servers...

wdennis
2018-03-16 16:36
It's *so* easy in feekin' kickstart... Preseed is so obtuse.

wdennis
2018-03-16 16:37
But, the reseachers all want Ubuntu OS...

gary.berger
2018-03-16 16:51
has joined #json

spector
2018-03-16 16:51
@tim_epkes $welcome

2018-03-16 16:51
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

tim_epkes
2018-03-16 16:51
Thanks

fpisupati
2018-03-16 18:03
has joined #json

nkabir
2018-03-16 18:06
@vlowther Yes. It's very thorough! I'm repeatedly cycling through the whole process in my staging environment to get more comfortable with the moving parts. I run your erase stage in my workflow. However, I only erase the first megabyte when I want a machine to return to discovery via PXE. Then the workflow takes over.

nkabir
2018-03-16 18:12
@nkabir uploaded a file: https://rackn.slack.com/files/U8BTZ6HPT/F9RAUAV4J/-.sh and commented: @wdennis with your successful manual install, have you dumped the working machine's configuration and compared it with your DRP template?

zehicle
2018-03-16 18:16
@fpisupati $welcome

2018-03-16 18:16
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

fpisupati
2018-03-16 19:15
thanks

2018-03-16 19:35

romain.lafontaine
2018-03-16 20:25
Random/unrelated : Was browsing your website and found a nice pict of you @shane @zehicle @greg https://www.rackn.com/company/ ^^ I felt it was worth to share with the community

shane
2018-03-16 20:26
LOL ... yeah, well @zehicle had a "wardrobe malfunction" and didn't end up wearing his kilt that day ... that's @vlowther in the middle, @greg on the right, and /me on the left ...

romain.lafontaine
2018-03-16 20:26
^^

spector
2018-03-16 20:27
These are unapproved marketing show uniforms :grinning:

shane
2018-03-16 20:27
s/unapproved/maverick/g

greg
2018-03-16 20:28
The strange part is that the merely average guy is short.

greg
2018-03-16 20:28
And comfortable

patrick.miller
2018-03-16 21:16
has joined #json

zehicle
2018-03-16 21:17
@patrick.miller $welcome

2018-03-16 21:17
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

patrick.miller
2018-03-16 21:17
thanks zehicle

shane
2018-03-16 21:17
indeed, welcome, @patrick.miller :slightly_smiling_face:

patrick.miller
2018-03-16 21:18
how can I display the kickstart file for a machine?

shane
2018-03-16 21:18
the rendered one - or the template that builds the final KS ?

patrick.miller
2018-03-16 21:18
rendered


patrick.miller
2018-03-16 21:19
excellent thanks!

shane
2018-03-16 21:19
there are a lot of nuggets in the $faq


shane
2018-03-16 21:20
(oops, I need to update Slackbot responses to point to `tip` doc instead of `latest`)

shane
2018-03-16 21:21
ok - done: $quickstart

2018-03-16 21:21

shane
2018-03-16 21:31
@patrick.miller in a short bit, the `tip` docs will update with some clean ups and enhancements to that Render doc - but the info is correct as you see it right now

shane
2018-03-16 23:44
@wdennis - I got a chance to look at your Erase SDA issue ... the problem is you set `sane-exit-codes` - we expect appropriate Exit Codes to mark success. However - our "insane" (aka older Exit Code definitions) also assume Exit Code of 0 (zero) for success. The `dd` command will always exit with code ` 1 ` and message "out of disk space". This causes the Agent/Runner to mark the job as failed, and your Workflow won't advance. By explicitly setting an `exit 0` at the end of the scriptlet will fix your issue. For reference, I am attaching a content pack that contains the testing I did. NOTE - I moved your scriptlet out of the payload of the Task, and in to a separate Template, as it makes it easier to extend/edit/track/modify, etc. But the principle should be the same (I did not test using the scriptlet embedded in the task w/ a modified `exit 0`). I'm including 2 workflows - your original NECLA and a modified `rebar-prep` workflow that does the same thing, but uses the Digital Rebar/RackN `erase-hard-disks-for-os-install` Task. If this needs modification to work right - we'd appreciate the feedback on what needs to change. It's 1000x's of times faster than a pure-`dd` solution to bigger disks.

shane
2018-03-16 23:46
@shane uploaded a file: https://rackn.slack.com/files/U6QFVRJNB/F9REST4BE/NECLA_and_Rebar_wipe_prep_content_.yaml and commented: Use like regular content pack. First delete your conflicting content with the same names. Then: `drpcli contents create -< necla-and-rebar-prep.yaml` Attach the appropriate Workflow stagemap to your machines, and boot as usual.

2018-03-17 01:29

scsikid
2018-03-17 02:19
has joined #json

shane
2018-03-17 03:18
@scsikid $welcome !

2018-03-17 03:18
Digital Rebar welcome information is here > http://rebar.digital/community/welcome.html

patrick.miller
2018-03-17 03:39
ah ok thanks Shane.

scsikid
2018-03-17 17:16
thanks @shane